datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

WIP: move parquet related functionality to datafusion-parquet crate

Open devinjdangelo opened this issue 1 year ago • 1 comments

Which issue does this PR close?

related to #11182

Rationale for this change

Splitting FileFormat implementations out of datafusion-core will help ensure datafusion remains modular and extensible, as well as improve maintainability. ``

What changes are included in this PR?

Early work on moving parquet related code to a datafusion-parquet crate.

Are these changes tested?

No (not complete yet)

Are there any user-facing changes?

Yes

devinjdangelo avatar Jul 01 '24 00:07 devinjdangelo

Nice -- thanks @devinjdangelo

I was thinking about potential ways to organize the crates. One potential in my mind was

datafusion-catalog (has TableProvider, CatalogProvider, etc, maybe Mem*Provider)
datafusion-catalog-listing (ListingTable)
datafusion-datasource-parquet
datafusion-datafsource-avro
datafusion-datasource-csv
datafusion-datasource-json
datafusion-datasource-arrow

Though perhaps that might be overkill (especially for formats like datafusion-datasource-csv...)

Maybe it would be better like

datafusion-catalog (has TableProvider, CatalogProvider, etc, maybe Mem*Provider)
datafusion-catalog-listing (ListingTable)
datafusion-datasource (built in formats like avro, csv, json, arrow, parquet)

alamb avatar Jul 01 '24 10:07 alamb

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Aug 31 '24 01:08 github-actions[bot]