LiuXin comments

Results 6 comments of


                                            LiuXin

多卡训练报错

我也碰上了跟题主一样的报错，请问有人解决了modelscope多卡训练的问题吗，还是说是环境问题 Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185461) of binary: /opt/conda/envs/modelscope/bin/python Traceback (most recent call last): File "/opt/conda/envs/modelscope/lib/python3.8/runpy.py", line 194, in _run_module_as_main return...

[Bug]: In milvus-standalone docker container. Issue related to `goroutine`, automaticly exits with code `134` after sometime. If no collection is loaded, no problem is caused.

I also have the same questions , do you have a solution ？Or does this have something to do with the long-term loading of collections when I deploy the interface?

现在我想用自己的本地数据集微调clip，请问我需要如何在本地构造数据然后加载本地数据集训练啊

> # 1. Construct the dataset > ``` > train.jsonl (each line): {"query_id": "111", "query": "吃饭的猫猫1", "image_id": "222", "image": "/path/to/cat_1.jpg"} > validation.jsonl (each line): {"query_id": "333", "query": "吃饭的猫猫2", "image_id": "444",...

基于damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch模型微调,运行finetune.py报错

> Please check training data，format reference (https://alibaba-damo-academy.github.io/FunASR/en/egs_modelscope/asr/TEMPLATE/README.html#finetune-with-your-data) 您好，我单卡训练没问题，但是多卡训练报错了，我的启动命令是CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node 2 finetune.py 报错如下： Task related config: error: unrecognized arguments: --local-rank=0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 185479) of...

BOT_SORT 利用自己数据集进行多目标追踪训练后如何进行验证

> 换检测的权重 det_weights 感谢解答，我看我训练出来的checkpoint大小和官方的不一致，预训练出来的ppyoloe_crn_l_36e_640x640_mot17half.pdparams大小是204M，我自己训练出来的大小都是214M，其他的模型比如centernet_dla34_140e_coco.pdparams自己训练出来的和官方大小都是一致的，而且替换权重进行验证的时候报没检测出目标的警告，请问这可能是什么原因呢？

这个是在联网吗？如何解决

我也是一样，我是服务器执行的，一直卡在这个地方，有什么解决办法吗