Chudong Tian

Results 9 issues of Chudong Tian

Hello, I found that compared with the native stylegan2, your code specifically removes the label. I want to ask you whether you have tried to add the label for training,...

When the network is disconnected, the server will reconnect after ten minutes, but select() always returns the connection timeout. The select() is in s3_filesys.cc CURLReadStreamBase::FillBuffer(). And, I changed select() to...

1. finetune的weight file需要改为F_weight_file而不是T_weight_file; 2. finetune里的学习率需要改为0.0001,不然无法收敛,导致weight file改为F_weight_file后,推理效果很差; 3. 在测试里,获得xywh后做一次NMS,替代原本取平均的操作; 修复上面3个问题后,效果回归正常。

When I use `from revChatGPT.revChatGPT import Chatbot` there is an error: `139 (interrupted by signal 11: SIGSEGV)` @acheong08

Hello, I am training gpt2 from scratch, but I found that the data processing of openwebtext is too slow, and our GPU server can't connect to the Internet. It's taken...

多卡多tensor并行 finetuning gpt3 1.3B之后,怎么用单卡进行推理。我在单卡上推理报错如下,但我如果用多卡推理的话,就是正常的 RuntimeError: DistributedGPT3Pipeline: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1169, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops,...

大概5小时可以训练完,但是loss一直是0,是正常的吗 {'loss': 0.0, 'learning_rate': 1.9230769230769234e-07, 'epoch': 0.99} {'loss': 0.0, 'learning_rate': 1.730769230769231e-07, 'epoch': 0.99} {'loss': 0.0, 'learning_rate': 1.5384615384615387e-07, 'epoch': 0.99} {'loss': 0.0, 'learning_rate': 1.3461538461538464e-07, 'epoch': 0.99} {'loss': 0.0, 'learning_rate': 1.153846153846154e-07, 'epoch':...

我基于10B模型做继续训练,loss只从11下降到5后。一般来讲,最终loss收敛后是多少。 我用了12w文本,其中文本长度平均在5000。训练参数: gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000,约等于5个epochs @jeffra @samyam @tjruwase @WrRan

Thanks for your model! I have a job that requires generating a foreground from a background image, but your model currently cannot control the position well and cannot keep harmony...