Unify `from_json` and `parse_tabular` implementations
This issue is to unify the existing from_json and from_jsonl implementations with the existing implementations in parse_tabular, from_csv, and from_parquet. This is to consolidate dynamic model generation and schema inference for these import functions. Current functionality (such as jmespath support) should be preserved, so the implementations likely cannot be identical between these import functions, but they should use similar dynamic model generation, schema inference, etc. and this should also ideally remove the dependency on datamodel-code-generator if possible.
This article may be helpful in the future, as it talks about pyarrow's support for JSON: https://arrow.apache.org/docs/python/generated/pyarrow.json.read_json.html
thanks @dtulga !
Hi, I am wondering if this issue is open for contribution under some guidance 🙂
@PanGan21 hi, yes, absolutely. Please take a look in the parse_tabular and from_json implementations, especially the part where it depends on the datamodel-code-generator - that's is hackiest part that we would like to get rid of. Let us know if something is not clear. It can not the simplest task tbh but can be an interesting one!