athena-cli icon indicating copy to clipboard operation
athena-cli copied to clipboard

Downloading results is very slow

Open jkleint opened this issue 7 years ago • 3 comments

It seems that when running a query in batch mode, the results are downloaded via the API in small chunks, which is very slow. For instance, using athena-cli to --execute a query that returned 19 MB (50000 rows) took 81 sec (about 250k / sec), but the actual Athena finished in 3 seconds and aws s3 cp downloads the results in 1 second.

Any reason you don't just use S3 to fetch the results?

jkleint avatar Aug 16 '18 22:08 jkleint

Yes, because the AWS SDK for Athena doesn't work that way.

If downloading results is slow with the athena CLI then you should just download the output directly from the Athena S3 bucket as mentioned here https://github.com/guardian/athena-cli/pull/25#issuecomment-339702475

satterly avatar Aug 29 '18 09:08 satterly

In the case of batch queries, would it be possible to get the url of the result from the Athena SDK, then use the S3 API to fetch?

jkleint avatar Sep 13 '18 21:09 jkleint

Yes, I suppose so. PR's welcome.

satterly avatar Sep 15 '18 08:09 satterly