Aoi

Results 19 issues of Aoi

### 📚 Documentation Improvement To address the concerns raised in Issues #1174 , #983 , and #894 , I recommend explicitly stating in the documentation that **the trained model can...

Add a loader to the mmdet.datasets module to download and load datasets from the huggingface hub. #11378 ## Motivation In our use case, this will help leverage datasets to manage...

Add a loader to the `mmdet.datasets` module to download and load datasets from the huggingface hub. **Motivation** In our use case, this will help leverage datasets to manage dataset caching,...

I believe you have stored the yolo series in a separate repo due to GPL contamination issues. I would like to know will you merge it into this Apache-2.0 licensed...

### Prerequisite - [X] I have searched [Issues](https://github.com/open-mmlab/mmengine/issues) and [Discussions](https://github.com/open-mmlab/mmengine/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmengine). ### Environment...

bug

I am using Hailort and Model Zoo in a closed-source project, both of which are released under the MIT license. However, I still have some questions about the licensing. As...

### Description Add a Datasource for reading data from WARC/ARC files. ### Use case In cleaning of pre-training data for LLM, Ray Data is nearly the only distributed solution (Dask...

enhancement
triage
data

A WarcDatasource has been added to facilitate the reading of WARC/ARC data types, to access files from Common Crawl. ## Why are these changes needed? In cleaning of pre-training data...

The naming here seems to be incorrect. https://github.com/huggingface/datatrove/blob/0f2c69f8249aa0c53ebcf10afa2394da506a953f/src/datatrove/pipeline/filters/gopher_quality_filter.py#L114-L120 Based on the implementation, the variable should likely be `min_alpha_words_ratio` instead of `max_non_alpha_words_ratio`. If it is `max_non_alpha_words_ratio`, it should be 0.2 instead...

Have you considered adding an open source license to this Git repository? I created a set of test datasets using code from this git repository, but am not sure if...