superhero-7

Results 16 comments of superhero-7

我也遇到了一样的问题,模型在验证集上要么就是全预测成狗,要么就全是猫

我发现把optimizer换成SGD准确度就上去了,虽然还不是很好但至少有学到东西了,但我还是不懂为何用Adam就没有效果,而且感觉Adam还要比SGD高级。之前一直在50%就等于没有学到东西,而且loss=0.69一直左右,你会发现CrossEntropyLoss = -0*ln(0.5) - 1*ln(0.5) = 0.69,就代表loss就是当猫狗各一半概率时候的损失,就是没学到东西!把visdom的图贴出来吧: ![image](https://user-images.githubusercontent.com/57797766/108858420-1864e680-7627-11eb-9bd8-ade1ecc98758.png) log: [0223_220208]epoch:0,lr:0.001,loss:0.6930131913593837,train_cm:[[4427 4323] [4311 4439]],val_cm:[[ 0 3750] [ 0 3750]][0223_220339]epoch:1,lr:0.001,loss:0.6922840761457184,train_cm:[[2339 6411] [1992 6758]],val_cm:[[ 769 2981] [ 423 3327]][0223_220510]epoch:2,lr:0.001,loss:0.6898045110702511,train_cm:[[2723 6027] [1945 6805]],val_cm:[[ 481...

Hi jeamin, I run your refcocog script which is for the REC task again,and find the usage of CPU memory seems also increase during the training but very slight. I...

This is my Dataset code,I think the problem maybe cause by this. T_T `class RefCOCOGenerationFineTuneDataset(Dataset): def __init__(self, refer=None, split='train', raw_dataset=None, rank=-1, topk=-1, verbose=True, args=None, mode='train'): super().__init__() refcocog_feature_dir = refcoco_dir.joinpath(args.dataset) refcocog_feature_dir...

> It could be related to high `num_workers` or HDF5. Could you try lowering the `num_workers` in dataloader, and/or replacing HDF5 with another file storing system (e.g., tsv, npy, pickle,...

I also use DDP with WebDataset in pytorch lightning recently. And I take openclip codebase as reference, link: https://github.com/mlfoundations/open_clip/blob/main/src/training/data.py#L152. I can run on multigpus at first, however it will stuck...

Sounds great! Can you share your code example? I still get stuck after training for a while. By the way, I am curious whether your progress bar is displayed normally?...

Thanks! I can run it normally now, but I still need to do a validation make sure the result is right. Does your progress bar looks like mine? For example,...

``` cd GroundingDINO export PATH=/usr/local/cuda/bin:$PATH export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH pip install -e . ``` And then I fixed all error...... That is crazy!

I have the same question when I use python 3.10... The error has gone in 3.8, I feel so confuse. Why this happen?