laughingman7743
laughingman7743
AbstractFileSystem https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L92 AbstractBufferedFile https://github.com/fsspec/filesystem_spec/blob/2022.7.1/fsspec/spec.py#L1299 S3FileSystem https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L168 S3File https://github.com/fsspec/s3fs/blob/2022.7.1/s3fs/core.py#L1822 It appears that awswrangler takes the approach of splitting the files into smaller chunk sizes and using ThreadPoolExecutor to retrieve them in...
Downloading S3 files locally and loading them by chunk size is a good idea. However, I believe Pandas 1.2 or 1.3 already supports reading per chunk size using S3Fs. https://pandas.pydata.org/docs/user_guide/io.html#reading-writing-remote-files...
https://github.com/laughingman7743/PyAthena/issues/46 https://github.com/laughingman7743/PyAthena/tree/master/benchmarks https://github.com/laughingman7743/PyAthena#pandascursor
I do not have an implementation that uses a combination of SQLAlchemy and Pandas. What is the use case for that?
For now, the current implementation does not allow the use of PandasCursor and SQLAlchemy in combination. It needs some modification, but by implementing switching the cursor to be used by...
What do you need a guide for, a guide for PandasCursor? It has a DB-API2.0 interface, which is the same as the default cursor. The way to fetch data is...
In the current implementation, SQLAlchemy and PandasCursor cannot be used in combination. As long as you use SQLAlchmey, you cannot change the default cursor. If you do not use SQLAlchemy,...
If you want to use PandasCursor in combination with SQLAlchemy, you probably need to modify the part that assembles the arguments of the connection object from the URL. https://github.com/laughingman7743/PyAthena/blob/master/pyathena/sqlalchemy_athena.py#L302-L332 There...
The PandasCursor itself already has a DataFrame, but it handles the fetching of the DataFrame to comply with the DB-API interface. https://github.com/laughingman7743/PyAthena/blob/master/pyathena/pandas/result_set.py#L100-L133 The read_sql_query method of Pandas calls the fetchall...
https://towardsdatascience.com/heres-the-most-efficient-way-to-iterate-through-your-pandas-dataframe-4dad88ac92ee