laughingman7743 comments

Results 73 comments of


                                            laughingman7743

Impl s3fs cursor

AbstractFileSystem https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L92 AbstractBufferedFile https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L1299 S3FileSystem https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L168 S3File https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L1822 It appears that awswrangler takes the approach of splitting the files into smaller chunk sizes and using ThreadPoolExecutor to retrieve them in...

Documenting/improving memory behavior

Downloading S3 files locally and loading them by chunk size is a good idea. However, I believe Pandas 1.2 or 1.3 already supports reading per chunk size using S3Fs. https://pandas.pydata.org/docs/user_guide/io.html#reading-writing-remote-files...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

https://github.com/laughingman7743/PyAthena/issues/46 https://github.com/laughingman7743/PyAthena/tree/master/benchmarks https://github.com/laughingman7743/PyAthena#pandascursor

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

I do not have an implementation that uses a combination of SQLAlchemy and Pandas. What is the use case for that?

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

For now, the current implementation does not allow the use of PandasCursor and SQLAlchemy in combination. It needs some modification, but by implementing switching the cursor to be used by...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

What do you need a guide for, a guide for PandasCursor? It has a DB-API2.0 interface, which is the same as the default cursor. The way to fetch data is...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

In the current implementation, SQLAlchemy and PandasCursor cannot be used in combination. As long as you use SQLAlchmey, you cannot change the default cursor. If you do not use SQLAlchemy,...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

If you want to use PandasCursor in combination with SQLAlchemy, you probably need to modify the part that assembles the arguments of the connection object from the URL. https://github.com/laughingman7743/PyAthena/blob/master/pyathena/sqlalchemy_athena.py#L302-L332 There...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

The PandasCursor itself already has a DataFrame, but it handles the fetching of the DataFrame to comply with the DB-API interface. https://github.com/laughingman7743/PyAthena/blob/master/pyathena/pandas/result_set.py#L100-L133 The read_sql_query method of Pandas calls the fetchall...

SQLAlchemy + Pandas very slow when compared to AWS Wrangler

https://towardsdatascience.com/heres-the-most-efficient-way-to-iterate-through-your-pandas-dataframe-4dad88ac92ee