Failed to Build Mixtral-8x7b Engine Because of Insufficient Memory
Building the engine inside the docker container, I got insufficient memory issue.
python ../llama/build.py --model_dir ./Mixtral-8x7B-instruct-v0.1 --use_inflight_batching --use_gpt_attention_plugin float16 --e nable_context_fmha --use_gemm_plugin float16 --world_size 8 --tp_size 8 --output_dir ./trt_engines/mixtral/TP --parallel_build
Got following errors:
[01/08/2024-17:57:31] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 2547452477568 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
)
[01/08/2024-17:57:31] [TRT-LLM] [E] Engine building failed, please check the error log.
[01/08/2024-17:57:31] [TRT-LLM] [I] Config saved to ./trt_engines/mixtral/TP/config.json.
Traceback (most recent call last):
File "/code/tensorrt_llm/examples/mixtral/../llama/build.py", line 934, in
I am using Nvidia A100 80GBx4
I also had this problem in newest code I used the suggest command : python3 ../llama/build.py --model_dir /workspace/models/Mixtral-8x7B-v0.1/ --enable_context_fmha --use_gemm_plugin --world_size 2 --tp_size 2 --use_inflight_batching --output_dir /workspace/models/trt-engine/Mixtral_tp2/ --parallel_build and the error is [01/09/2024-06:30:53] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[01/09/2024-06:30:53] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [01/09/2024-06:30:53] [TRT-LLM] [E] Engine building failed, please check the error log. [01/09/2024-06:30:53] [TRT-LLM] [I] Config saved to /workspace/models/trt-engine/Mixtral_tp2/config.json. [01/09/2024-06:30:54] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called. [01/09/2024-06:30:55] [TRT] [I] Detected 74 inputs and 35 output network tensors. ^@[01/09/2024-06:30:59] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
[01/09/2024-06:30:59] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
)
[01/09/2024-06:30:59] [TRT-LLM] [E] Engine building failed, please check the error log.
Traceback (most recent call last):
File "/workspace/TensorRT-LLM/examples/llama/build.py", line 930, in
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/workspace/TensorRT-LLM/examples/llama/build.py", line 880, in build assert engine is not None, f'Failed to build engine for rank {cur_rank}' AssertionError: Failed to build engine for rank 0
I had tried,the tag v0.7.1 had no problem but the main had problem
The same question, how to solve the problem in main branch?
we met the same issue using V100 for v0.7.1 (no problem using A100): [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0 requires 224716128768 bytes of scratch space, but only 34079637504 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [2024-02-04 18:22:28] [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0 requires 224716128768 bytes of scratch space, but only 34079637504 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [2024-02-04 18:22:28] ) [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT-LLM] [E] Engine building failed, please check the error log.