ProX
ProX copied to clipboard
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
Hi there, great work! Do you have plans for code datatset, and if yes when can we expect it?
Great and insightful work! If the refinement model does not support multilingualism, will it work for multilingual datasets?
Thanks for your great work! I have some questions regarding the construction of the ProX training data (Appendix A): As shown in Table 1, ProX includes: Document Level: drop_doc(), keep_doc()...