Cannot figure out how to create/upload a dataset correctly.
Search before asking
- [X] I have searched the HUB issues and found no similar bug report.
HUB Component
Datasets
Bug
Cannot figure out how to create/upload a dataset correctly.
it keeps saying unable to process dataset.
My verify is setup correctly:
from ultralytics.hub import check_dataset
check_dataset('data8.zip', task="segment")
PS C:\Users\antdx\Downloads\SAM2> python verify.py
Starting HUB dataset checks for C:\Users\antdx\Downloads\SAM2\data8.zip....
WARNING ⚠️ Skipping C:\Users\antdx\Downloads\SAM2\data8.zip unzip as destination directory C:\Users\antdx\Downloads\SAM2\\data8 is not empty.
Scanning C:\Users\antdx\Downloads\SAM2\data8\data8\labels\train... 8 images, 2 backgrounds, 0 corrupt: 100%|██████████|
New cache created: C:\Users\antdx\Downloads\SAM2\data8\data8\labels\train.cache
Statistics: 100%|██████████| 10/10 [00:00<?, ?it/s]
Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/.
PS C:\Users\antdx\Downloads\SAM2>

How do I fix?
Environment
No response
Minimal Reproducible Example
No response
Additional
No response
👋 Hello @AntDX316, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
- Quickstart. Start training and deploying YOLO models with HUB in seconds.
- Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
- Projects: Creating and Managing. Group your models into projects for improved organization.
- Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
- Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
- Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
- Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.
If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce the issue. Additionally, a Minimum Reproducible Example (MRE) is required to assist in troubleshooting. This typically includes:
- A clear description of the issue and any relevant error messages.
- Code snippets (if applicable) and the full command used.
- Any data or files necessary to replicate the issue (ensure sensitive data is redacted).
- Your environment details (e.g., operating system, Python version, etc.).
For your case, as you're experiencing difficulties with dataset uploads, please ensure your dataset adheres to the HUB Dataset Formatting Guide for the specified task (segment in your example). If possible, provide us with a step-by-step explanation of the issue, including the files involved, any command outputs, and how we can replicate it on our end.
We try to respond to all issues as promptly as possible 💡. An Ultralytics engineer will review and assist you soon—thank you for your patience! 😊
@AntDX316 Hello!
Can you share your .yaml file? Maybe it will help us.
Also, please check our documentation for more information: https://docs.ultralytics.com/hub/datasets#upload-dataset.
Hi @AntDX316, thanks for reaching out!
We currently have no issues on our end with uploading valid datasets to Ultralytics HUB. To ensure your dataset is formatted correctly, please watch this YouTube video: Upload Datasets to Ultralytics HUB , which provides a step-by-step guide for creating a valid dataset for Ultralytics HUB. Once you've checked your dataset, try uploading it again following the instructions in the video and in the documentation here.
If the issue persists, please let us know with any additional details, such as a screenshot or a sample of your dataset structure. This will help us assist you further. Thank you for your patience!
@AntDX316 Hello! Can you share your
.yamlfile? Maybe it will help us.Also, please check our documentation for more information: https://docs.ultralytics.com/hub/datasets#upload-dataset.
I'm using the default one to see.
Might as well see the whole dataset: I'm just trying to see what it can do. data8.zip
Ultimately, I'm trying to figure out how to make my own ones the Computer Vision can detect.
I was trying to use SAM2: https://docs.ultralytics.com/models/sam-2/#sam-2-comparison-vs-yolov8
from ultralytics.data.annotator import auto_annotate
auto_annotate(data="images", det_model="yolo11x.pt", sam_model="sam2_b.pt")
Default coco8 example dataset:
https://hub.ultralytics.com/datasets
path: ''
train: images/train
val: images/val
test: null
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
download: null
Please fix the verify.py as it shouldn't say Checks completed correctly when it's not ready or it's bust on the Ultralytics HUB side.
From my end, I have no idea if this is even how to do it. Can you guys release code to do it properly on python as well as an area on the Ultralytics HUB site to do this with an easy to use UI?
@AntDX316 Strange, upload works fine on my end. We will investigate this further. CC @yogendrasinghx
@AntDX316 Strange, upload works fine on my end. We will investigate this further. CC @yogendrasinghx
great, I want it to work perfectly : )
Also, ensure this is investigated too: https://github.com/ultralytics/hub/issues/950#issuecomment-2540110125