Longxu Dou
Longxu Dou
### Reason of Failure The original JAMR was installed in 2015 or 2016 in the server, thus some packages were broken or not updated. The default setup script in JAMR...
We extend our gratitude to the authors of this repository! Your documentation and code have greatly benefited the community. We have used this repo in building the data processing pipeline...
Thanks for your helpful codebase! I am a bit confused about `stop words filtering`. The release code removes the document, if its stop words ratio below the certain cutoff. https://github.com/bigscience-workshop/data-preparation/blob/9d0588419073cc5bf0fb92b58f37f2a1016572c3/preprocessing/training/01b_oscar_cleaning_and_filtering/filtering.py#L590...
Thanks for your amazing codebase! I find that the link of [Deduplication Report](https://chenghaomou.github.io/1%20Projects/BigScience/SubProjects/Deduplication%20report) in `preprocessing/training/01b_oscar_cleaning_and_filtering/deduplicate/README.md` is not accessible. Could you please update it?
Hi, I'm very amazing to see you guys' work. I have encountered some problems while reproducing your model in my environment setting. I doubt it was because the complex version...
Appreciate for this interesting work! I trained a new T5 model from scratch using your script and predicted with PICARD but encounter a problem. **Modification**: replacing the `COLUMN` with `TABLE.COLUMN`...