TensorRT-LLM Failed to Build Mixtral-8x7b Engine Because of Insufficient Memory

Building the engine inside the docker container, I got insufficient memory issue.

python ../llama/build.py --model_dir ./Mixtral-8x7B-instruct-v0.1 --use_inflight_batching --use_gpt_attention_plugin float16 --e nable_context_fmha --use_gemm_plugin float16 --world_size 8 --tp_size 8 --output_dir ./trt_engines/mixtral/TP --parallel_build

Got following errors: [01/08/2024-17:57:31] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 2547452477568 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [01/08/2024-17:57:31] [TRT-LLM] [E] Engine building failed, please check the error log. [01/08/2024-17:57:31] [TRT-LLM] [I] Config saved to ./trt_engines/mixtral/TP/config.json. Traceback (most recent call last): File "/code/tensorrt_llm/examples/mixtral/../llama/build.py", line 934, in build(0, args) File "/code/tensorrt_llm/examples/mixtral/../llama/build.py", line 880, in build assert engine is not None, f'Failed to build engine for rank {cur_rank}' AssertionError: Failed to build engine for rank 0

I am using Nvidia A100 80GBx4

Jan 08 '24 19:01 Jeevi10

I also had this problem in newest code I used the suggest command : python3 ../llama/build.py --model_dir /workspace/models/Mixtral-8x7B-v0.1/ --enable_context_fmha --use_gemm_plugin --world_size 2 --tp_size 2 --use_inflight_batching --output_dir /workspace/models/trt-engine/Mixtral_tp2/ --parallel_build and the error is [01/09/2024-06:30:53] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

[01/09/2024-06:30:53] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [01/09/2024-06:30:53] [TRT-LLM] [E] Engine building failed, please check the error log. [01/09/2024-06:30:53] [TRT-LLM] [I] Config saved to /workspace/models/trt-engine/Mixtral_tp2/config.json. [01/09/2024-06:30:54] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called. [01/09/2024-06:30:55] [TRT] [I] Detected 74 inputs and 35 output network tensors. ^@[01/09/2024-06:30:59] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit().

[01/09/2024-06:30:59] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/mlp/PLUGIN_V2_MixtureOfExperts_0 requires 6876779511936 bytes of scratch space, but only 84987740160 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). ) [01/09/2024-06:30:59] [TRT-LLM] [E] Engine building failed, please check the error log. Traceback (most recent call last): File "/workspace/TensorRT-LLM/examples/llama/build.py", line 930, in mp.spawn(build, nprocs=args.world_size, args=(args, )) File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 246, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method="spawn") File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 202, in start_processes while not context.join(): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 163, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/workspace/TensorRT-LLM/examples/llama/build.py", line 880, in build assert engine is not None, f'Failed to build engine for rank {cur_rank}' AssertionError: Failed to build engine for rank 0

Jan 09 '24 06:01 yoyopdc

I had tried,the tag v0.7.1 had no problem but the main had problem

Jan 10 '24 02:01 yoyopdc

The same question, how to solve the problem in main branch?

Jan 10 '24 09:01 littletomatodonkey

we met the same issue using V100 for v0.7.1 (no problem using A100): [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT] [E] 4: Internal error: plugin node LLaMAForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0 requires 224716128768 bytes of scratch space, but only 34079637504 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [2024-02-04 18:22:28] [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT] [E] 4: [pluginV2Builder.cpp::makeRunner::519] Error Code 4: Internal Error (Internal error: plugin node LLaMAForCausalLM/layers/0/attention/PLUGIN_V2_GPTAttention_0 requires 224716128768 bytes of scratch space, but only 34079637504 is available. Try increasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [2024-02-04 18:22:28] ) [2024-02-04 18:22:28] [02/04/2024-10:22:28] [TRT-LLM] [E] Engine building failed, please check the error log.

Feb 04 '24 11:02 PeterWang1986