environment
1)请问环境配置中有的包需要numpy大于2.0,而python==3.8只能安装numpy==1.24,请问可以升级到python==3.10吗? 2)请问模型是不是需要A100才能配置运行,4090架构模型某些代码运行不了?
你好!。1. 应该没问题的,环境和LAVIS的环境基本一致。2. 4090可以运行的,但是训练按照默认config应该显存不太够,需要降低batch size。
你好!。1. 应该没问题的,环境和LAVIS的环境基本一致。2. 4090可以运行的,但是训练按照默认config应该显存不太够,需要降低batch size。
谢谢回复,所以请问还是建议使用python==3.8环境吗?
嗯嗯,LAVIS也是python3.8环境,也经过我们测试。但更高版本的python环境应该影响也不大。
嗯嗯,LAVIS也是python3.8环境,也经过我们测试。但更高版本的python环境应该影响也不大。
感谢您的回复!请问方便conda list给出所有包的版本吗,万分感谢!
This file may be used to create an environment using:
$ conda create --name --file
platform: linux-64
_libgcc_mutex=0.1=main _openmp_mutex=5.1=1_gnu accelerate=0.30.1=pypi_0 altair=5.3.0=pypi_0 annotated-types=0.7.0=pypi_0 antlr4-python3-runtime=4.9.3=pypi_0 asttokens=2.4.1=pypi_0 attrs=23.2.0=pypi_0 backcall=0.2.0=pypi_0 bleach=6.1.0=pypi_0 blinker=1.8.2=pypi_0 blis=0.7.11=pypi_0 braceexpand=0.1.7=pypi_0 ca-certificates=2024.3.11=h06a4308_0 cachetools=5.3.3=pypi_0 catalogue=2.0.10=pypi_0 certifi=2024.2.2=pypi_0 cfgv=3.4.0=pypi_0 charset-normalizer=3.3.2=pypi_0 click=8.1.7=pypi_0 cloudpathlib=0.16.0=pypi_0 confection=0.1.4=pypi_0 contexttimer=0.3.3=pypi_0 contourpy=1.1.1=pypi_0 cycler=0.12.1=pypi_0 cymem=2.0.8=pypi_0 dcnv4=1.0.0.post2=pypi_0 decorator=5.1.1=pypi_0 decord=0.6.0=pypi_0 distlib=0.3.8=pypi_0 einops=0.8.0=pypi_0 en-core-web-sm=3.7.1=pypi_0 executing=2.0.1=pypi_0 fairscale=0.4.4=pypi_0 filelock=3.14.0=pypi_0 fonttools=4.52.1=pypi_0 fsspec=2024.5.0=pypi_0 ftfy=6.2.0=pypi_0 gitdb=4.0.11=pypi_0 gitpython=3.1.43=pypi_0 huggingface-hub=0.23.1=pypi_0 identify=2.5.36=pypi_0 idna=3.7=pypi_0 imageio=2.34.1=pypi_0 importlib-resources=6.4.0=pypi_0 iopath=0.1.10=pypi_0 ipython=8.12.3=pypi_0 jedi=0.19.1=pypi_0 jinja2=3.1.4=pypi_0 joblib=1.4.2=pypi_0 jsonschema=4.22.0=pypi_0 jsonschema-specifications=2023.12.1=pypi_0 kaggle=1.6.14=pypi_0 kiwisolver=1.4.5=pypi_0 langcodes=3.4.0=pypi_0 language-data=1.2.0=pypi_0 lazy-loader=0.4=pypi_0 ld_impl_linux-64=2.38=h1181459_1 libffi=3.4.4=h6a678d5_1 libgcc-ng=11.2.0=h1234567_1 libgomp=11.2.0=h1234567_1 libstdcxx-ng=11.2.0=h1234567_1 marisa-trie=1.1.1=pypi_0 markdown-it-py=3.0.0=pypi_0 markupsafe=2.1.5=pypi_0 matplotlib=3.7.5=pypi_0 matplotlib-inline=0.1.7=pypi_0 mdurl=0.1.2=pypi_0 mpmath=1.3.0=pypi_0 murmurhash=1.0.10=pypi_0 ncurses=6.4=h6a678d5_0 networkx=3.1=pypi_0 nltk=3.8.1=pypi_0 nodeenv=1.8.0=pypi_0 numpy=1.24.4=pypi_0 nvidia-cublas-cu12=12.1.3.1=pypi_0 nvidia-cuda-cupti-cu12=12.1.105=pypi_0 nvidia-cuda-nvrtc-cu12=12.1.105=pypi_0 nvidia-cuda-runtime-cu12=12.1.105=pypi_0 nvidia-cudnn-cu12=8.9.2.26=pypi_0 nvidia-cufft-cu12=11.0.2.54=pypi_0 nvidia-curand-cu12=10.3.2.106=pypi_0 nvidia-cusolver-cu12=11.4.5.107=pypi_0 nvidia-cusparse-cu12=12.1.0.106=pypi_0 nvidia-nccl-cu12=2.20.5=pypi_0 nvidia-nvjitlink-cu12=12.5.40=pypi_0 nvidia-nvtx-cu12=12.1.105=pypi_0 omegaconf=2.3.0=pypi_0 opencv-python-headless=4.5.5.64=pypi_0 opendatasets=0.1.22=pypi_0 openssl=3.0.13=h7f8727e_2 packaging=24.0=pypi_0 pandas=2.0.3=pypi_0 parso=0.8.4=pypi_0 peft=0.8.2=pypi_0 pexpect=4.9.0=pypi_0 pickleshare=0.7.5=pypi_0 pillow=10.3.0=pypi_0 pip=24.0=py38h06a4308_0 pkgutil-resolve-name=1.3.10=pypi_0 platformdirs=4.2.2=pypi_0 plotly=5.22.0=pypi_0 portalocker=2.8.2=pypi_0 pre-commit=3.5.0=pypi_0 preshed=3.0.9=pypi_0 prompt-toolkit=3.0.43=pypi_0 protobuf=4.25.3=pypi_0 psutil=5.9.8=pypi_0 ptyprocess=0.7.0=pypi_0 pure-eval=0.2.2=pypi_0 pyarrow=16.1.0=pypi_0 pycocoevalcap=1.2=pypi_0 pycocotools=2.0.7=pypi_0 pydantic=2.7.1=pypi_0 pydantic-core=2.18.2=pypi_0 pydeck=0.9.1=pypi_0 pygments=2.18.0=pypi_0 pyparsing=3.1.2=pypi_0 python=3.8.19=h955ad1f_0 python-dateutil=2.9.0.post0=pypi_0 python-magic=0.4.27=pypi_0 python-slugify=8.0.4=pypi_0 pytz=2024.1=pypi_0 pywavelets=1.4.1=pypi_0 pyyaml=6.0.1=pypi_0 readline=8.2=h5eee18b_0 referencing=0.35.1=pypi_0 regex=2024.5.15=pypi_0 requests=2.32.2=pypi_0 rich=13.7.1=pypi_0 rpds-py=0.18.1=pypi_0 safetensors=0.4.3=pypi_0 salesforce-lavis=1.0.2=pypi_0 scenegraphparser=0.1.0=pypi_0 scikit-image=0.21.0=pypi_0 scikit-learn=1.3.2=pypi_0 scipy=1.10.1=pypi_0 seaborn=0.13.2=pypi_0 sentencepiece=0.2.0=pypi_0 setuptools=69.5.1=py38h06a4308_0 six=1.16.0=pypi_0 smart-open=6.4.0=pypi_0 smmap=5.0.1=pypi_0 spacy=3.7.4=pypi_0 spacy-legacy=3.0.12=pypi_0 spacy-loggers=1.0.5=pypi_0 sqlite=3.45.3=h5eee18b_0 srsly=2.4.8=pypi_0 stack-data=0.6.3=pypi_0 streamlit=1.35.0=pypi_0 sympy=1.12=pypi_0 tabulate=0.9.0=pypi_0 tenacity=8.3.0=pypi_0 text-unidecode=1.3=pypi_0 textblob=0.17.1=pypi_0 thinc=8.2.3=pypi_0 threadpoolctl=3.5.0=pypi_0 tifffile=2023.7.10=pypi_0 timm=0.4.12=pypi_0 tk=8.6.14=h39e8969_0 tokenizers=0.13.3=pypi_0 toml=0.10.2=pypi_0 toolz=0.12.1=pypi_0 torch=2.1.2+cu121=pypi_0 torchaudio=2.1.2+cu121=pypi_0 torchvision=0.16.2+cu121=pypi_0 tornado=6.4=pypi_0 tqdm=4.66.4=pypi_0 traitlets=5.14.3=pypi_0 transformers=4.26.1=pypi_0 triton=2.1.0=pypi_0 typer=0.9.4=pypi_0 typing-extensions=4.12.0=pypi_0 tzdata=2024.1=pypi_0 urllib3=2.2.1=pypi_0 virtualenv=20.26.2=pypi_0 wasabi=1.1.2=pypi_0 watchdog=4.0.1=pypi_0 wcwidth=0.2.13=pypi_0 weasel=0.3.4=pypi_0 webdataset=0.2.86=pypi_0 webencodings=0.5.1=pypi_0 wheel=0.43.0=py38h06a4308_0 xz=5.4.6=h5eee18b_1 zipp=3.18.2=pypi_0 zlib=1.2.13=h5eee18b_1
您好,感谢之前的解答!关于这个data.sh运行所需要的这三个文件,最下面的二个都可以下载,但是最上面缺失的data/vg/annotations/vg1.0/densecap_splits.json和data/vg/annotations/vg1.2/densecap_splits.json好像找不到了?
你好,我们上传了converted annotations,可以直接这个转化标注训练,不需要运行data.sh。
====== Model Attributes ======
2025-05-11 16:52:59,415 [INFO] {
"apply_lemmatizer": false,
"arch": "controlcap_t5",
"do_sample": false,
"drop_path_rate": 0,
"finetune_llm": false,
"first_word_control": false,
"freeze_vit": true,
"img_size": 224,
"length_penalty": 0,
"max_new_tokens": 20,
"max_txt_len": 32,
"min_length": 1,
"num_beams": 2,
"num_query_token": 32,
"num_return_sequences": 1,
"pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xl.pth",
"repetition_penalty": 1.5,
"t5_model": "google/flan-t5-xl",
"tag_bert_config": "controlcap/models/tagging_heads/tag_bert_config.json",
"tag_list": "controlcap/common/tagging/ram_tag_list.txt",
"tag_thr": 0.7,
"temperature": 1,
"top_p": 0.9,
"use_grad_checkpoint": false,
"vit_model": "eva_clip_g",
"vit_precision": "fp16"
}
2025-05-11 16:52:59,416 [INFO] Building datasets...
loading annotations into memory...
[2025-05-11 16:59:08,802] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2243255 closing signal SIGTERM
[2025-05-11 16:59:08,808] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2243257 closing signal SIGTERM
[2025-05-11 16:59:08,808] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 2243258 closing signal SIGTERM
[2025-05-11 16:59:09,876] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) local_rank: 1 (pid: 2243256) of binary: /data/xsf/anaconda3/envs/controlcap/bin/python
Traceback (most recent call last):
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/run.py", line 810, in
main()
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data/xsf/anaconda3/envs/controlcap/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2025-05-11_16:59:08 host : user-Super-Server rank : 1 (local_rank: 1) exitcode : -9 (pid: 2243256) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 2243256
您好!请问我的实验设备是四张4090(24G),在运行到“loading annotations into memory...”时,服务器经常断开连接或则无法继续运行,请问是什么问题呢,上文是终端信息。希望能够得到您的解答,感谢!
“loading annotations into memory...”是pycocotools中导入coco格式标注文件时候的打印信息。看起来是controlcap/datasets/dataset.py中46行COCO(ann_file)的命令卡住了,可以先单独debug这个部分,看看是什么原因。
转化过的标注文件都是以coco格式保存的,可以试试这些标注文件是否可以正常被COCO读取。
您好!请问这个文件assets/groundingdino_swint_ogc.pth好像没有在项目列表中,我需要到哪里下载呢
你好,这个checkpoint在groundingdino的repo下能找到。但这部分代码似乎没有在项目使用到,似乎没有必要下载。。