parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

GH-1835 Extract SchemaElement conversion from ParquetMetadataConverter

Open pyckle opened this issue 2 months ago • 0 comments

Rationale for this change

ParquetMetadataConverter has gotten way too large - it needs to be broken up. SchemaElement conversion is a good starting point to refactor into an external class because:

  • It is an actively developed part of the class - recent changes for variant and geographical types have changed this code
  • It's not strongly coupled to other conversion logic
  • Moving it to parquet-column will reduce code duplication in parquet readers want hadoop dependencies (full disclosure: I had to duplicate some of this code in my downstream parquet lib)

What changes are included in this PR?

All SchemaElement logic is moved to ParquetSchemaConverter in the parquet-column project. Further cleanup to remove boiler plate enum conversion logic to a different separate class has been done. Tests are also moved appropriately. Minor deduplication was done for getting LogicalTypeAnnotation from deprecated ConvertedType enum.

Are these changes tested?

Existing tests have been carefully moved to ensure no changes in behavior.

Are there any user-facing changes?

  • Conversion functions for SchemaElement to and from MessageType are now public.
  • Existing public functions that were moved are now deprecated delegates to ensure backwards compatibility.

Closes #1835 Further cleanup of this class is needed, and as such, perhaps closing this issue is not the correct action. I think the next candidate to refactor out is the ColumnChunk metadata conversion.

pyckle avatar Dec 04 '25 13:12 pyckle