streaming
streaming copied to clipboard
Column logical (not physical) type and allow_schema_mismatch
This PR was split out of a larger Parquet streaming PR, to follow.
- Implement
allow_schema_mismatch-- checks all shards to verify that their schema (column name and type signatures) match. This functionality is an important safety check for Parquet streaming relating to accidentally including Parquet files and other user error. - We are able to do this across shard types (a string field in a JSONL shard should be returned the same as that same string field in an MDS shard, say) because we now have the concept of logical (as opposed to physical) column types. Logical column types are Streaming's vocabulary of types which are shared by all shard formats, which each shard's encodings map to. Shard formats (really, talking about MDS here) may have multiple ways to encode a value, that all have the same logical type.