AGI-player
AGI-player
使用新版本代码(commit 33f2c0d4f89cf76671c0fdfbcee79d732b6a020e),随机初始化权重训练llama2-13b模型,多机情况下采用deepspeed zero3模式,配置如下,采用bf16,训练学习率(warmup)和loss情况如下,整体看着还算正常 { "train_batch_size": "auto", "train_micro_batch_size_per_gpu" :"auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled":...
run the command as follow: python convert_checkpoint.py --model_dir /Qwen1.5-32B-Chat/ --dtype bfloat16 --output_dir /Qwen1.5-32B/trt_ckpts/bf16/1-gpu/ error: the config.json for Qwen1.5-32B-Chat may cause this problem 
i use GenerationExecutorWorker for web service, using the parameters stop_words_list = [["hello, yes"]] by modifying the as_inference_request function in exectutor.py as follow: the ir parameter as follow:  then failed
hello, in the axis_aligned_target_assigner.py, as follows:  the idx in line 86 means the index of gt including all the three classes. While in the following code:  the gt_ids...
run inference with /TensorRT-LLM/examples/run.py , it's ok mpirun -n 4 -allow-run-as-root python3 /load/trt_llm/TensorRT-LLM/examples/run.py \ --input_text "hello,who are you?" \ --max_output_len=50 \ --tokenizer_dir /load/Qwen1.5-32B-Chat/ \ --engine_dir=/load/output/trt_llm/trt_engines_qw32/f16_sq0.5_4gpu/ but failed to use TensorRT-LLM/examples/apps/fastapi_server.py...