Set source name (via Metadata) from data when using document loaders
Is there a way to override the source when using the new document store feature (and/or document loaders in general).
Take, for example, the JSON lines loader.
It would be great if there was a way to use a field from the JSON data to set the source.
I tried this…
But it just comes through as a hardcoded string…
If this isn't possible, I wonder what the best alternative is.
In this specific use case I'm basically trying to get a load of HTML pages, scraped from a site which requires authentication, uploaded as documents with the source set to their URL.
I figured I could save the HTML to a JSON file and upload it that way, but would need to set the source.
I believe I can't use Cheerio etc. because of the need to log in to the web site before scraping it (it's my own site).
you can try creating a new jsonl file with just the source content in it