reader
reader copied to clipboard
Pile in reader format
Hi! This looks interesting. I wonder if you could convert the Pile dataset taken from respective urls in the jina reader format to experiment with LLM pre-training?
Not in the plan. Reader mainly focuses on HTML-type URL.