Yilun Huang
Yilun Huang
Hi, thanks for your excellent work on MLLMs! I downloaded the [Cambrian-Alignment](https://huggingface.co/datasets/nyu-visionx/Cambrian-Alignment) dataset and I found there might be something wrong with this dataset. When I checked the sources of...
After removing two lines of code in PR #597, there is an issue for Sandbox that could not find `work_dir` in later steps. It's hard to resolve this issue by...
For now, running Data-Juicer on multiple nodes in "ray" mode, which uses `map_batches` to process datasets, might cause some implicit problems. The `map_batches` method has two arguments, `num_gpus` and `concurrency`,...
Update KDD tutorials to the latest version of Data-Juicer. And merge them into the main branch if it's OK. Refer: ### Discussed in https://github.com/modelscope/data-juicer/discussions/475 Originally posted by **Tendo33** November 6,...
As the title says. * remove sandbox-related code and configs * remove deps * update docs * move hpo and quality_classifier tools into the internal tools