srsly icon indicating copy to clipboard operation
srsly copied to clipboard

Typed JSON API

Open kabirkhan opened this issue 2 years ago • 0 comments

Context

Currently most loader functions use the JSONOutput return type. This type is pretty non specific and a bit hard to reason about downstream. I find myself having to cast or validate the resulting type all the time.

The current loaders return JSONOutput because this file e.g. is valid JSON and would be parsed properly by ujson

test_file.json

hello

However, if we get this JSON file as an input in most of our code paths, we would want to raise an error as this is almost certainly invalid for what we want to do next.

This PR adds validation on top of the existing JSON API to ensure you get the expected type from a loader function.

If we like this approach, I can add to the rest of the API, just implementing the most commonly used functions from the JSON API for now.

Summary of Changes

Add read_json_dict - read JSON file and validate the resulting object is a dict Add read_json_list - read JSON file and validate the resulting object is a list of dicts Add read_jsonl_dicts - read JSONL file and validate each line is a valid dict in the resulting generator

kabirkhan avatar Mar 10 '23 01:03 kabirkhan