TensorRT-LLM fix: [nvbugs/5066257] serialization improvments

Description

Add an approve list into the pickle deserialization process to reduce the attack surface of using pickle to a subset of supported objects.

Test Coverage

if a new object is added this PR will cause a ValueError to be raised that clearly states what module is not supported that is being serialized/deserialized. Fully utilizing the zmq interfaces would test this code.

Apr 25 '25 16:04 coldwaterq

Thanks a lot for making this PR! Several general comments:

Could you please follow https://github.com/NVIDIA/TensorRT-LLM/blob/main/CONTRIBUTING.md to sign your commits and run pre-commit hook.
Please add the complicated dataclass serialization unit test we discussed offline to test_executor.py. Also would be good to have additional tests for serailization.py. I can trigger a pipeline test after you added the tests to see if there is any test failure.

Apr 25 '25 20:04 yibinl-nvidia

@kaiyux This is the "approved list approach" I discussed with you offline. Let me know your thoughts.

cc @Superjomn @litaotju @juney-nvidia

Apr 25 '25 21:04 yibinl-nvidia

@yibinl-nvidia I fixed the DCO and ran the pre-commit hooks, added the tests, and made some improvements based on the comments. let me know if there is anything else I should address.

Apr 28 '25 20:04 coldwaterq

/bot run

Apr 28 '25 22:04 yibinl-nvidia

PR_Github #3654 [ run ] triggered by Bot

Apr 28 '25 22:04 tensorrt-cicd

PR_Github #3654 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #2585 completed with status: 'FAILURE'

Apr 28 '25 23:04 tensorrt-cicd

I did a micro-benchmark comparing the custom serialization and deserialization against Pickle. The first result is based on a simple GenerationResult object (same one as in the test_executor.py) approximately 2 KB in size. The second result focused on benchmarking message with various size. This shows that the new approach introduces trivial μs-level of perf degradation.

Serialization and Deserialization Performance Results for GenerationRequest with 10000 iterations:
Pickle (avg/min/max): 199.038μs / 186.559μs / 1271.870μs
Custom (avg/min/max): 204.601μs / 192.718μs / 363.867μs
Average time difference: 5.56 μs

Size Comparison:
Pickle size: 1920 bytes
Custom size: 1920 bytes
Size ratio (custom/pickle): 1.00x

Serialization Performance Across Different Message Sizes:
Size            Pickle Avg (μs)         Custom Avg (μs)         Diff (μs)       Size Ratio
--------------------------------------------------------------------------------
1.0 KB               200.348                 205.710               5                    1.00x
10.0 KB              279.806                 285.365               5                    1.00x
100.0 KB            1044.078                1051.731               7                    1.00x
1.0 MB              8487.678                8505.488              17                    1.00x
10.0 MB            92523.012               93069.062             546                    1.00x

Apr 29 '25 03:04 yibinl-nvidia

The last commit should resolve the issue reported by blossom.

Apr 29 '25 16:04 coldwaterq

/bot run

Apr 29 '25 16:04 yibinl-nvidia

PR_Github #3744 [ run ] triggered by Bot

Apr 29 '25 16:04 tensorrt-cicd

PR_Github #3744 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2650 completed with status: 'FAILURE'

Apr 29 '25 18:04 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

Apr 29 '25 21:04 yibinl-nvidia

PR_Github #3766 [ run ] triggered by Bot

Apr 29 '25 21:04 tensorrt-cicd

the tests that were failed in the basic bot run should be fixed with those new commits.

Apr 30 '25 02:04 coldwaterq

PR_Github #3766 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2667 completed with status: 'FAILURE'

Apr 30 '25 04:04 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

Apr 30 '25 16:04 yibinl-nvidia

PR_Github #3873 [ run ] triggered by Bot

Apr 30 '25 16:04 tensorrt-cicd

PR_Github #3873 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2746 completed with status: 'FAILURE'

Apr 30 '25 23:04 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

May 01 '25 20:05 yibinl-nvidia

PR_Github #3941 [ run ] triggered by Bot

May 01 '25 20:05 tensorrt-cicd

PR_Github #3941 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2793 completed with status: 'FAILURE'

May 02 '25 03:05 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

May 03 '25 07:05 yibinl-nvidia

PR_Github #4000 [ run ] triggered by Bot

May 03 '25 07:05 tensorrt-cicd

PR_Github #4000 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2841 completed with status: 'FAILURE'

May 03 '25 17:05 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

May 04 '25 00:05 yibinl-nvidia

PR_Github #4012 [ run ] triggered by Bot

May 04 '25 00:05 tensorrt-cicd

PR_Github #4012 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2850 completed with status: 'FAILURE'

May 04 '25 07:05 tensorrt-cicd

/bot run --add-multi-gpu-test --disable-fail-fast

May 04 '25 16:05 yibinl-nvidia

PR_Github #4038 [ run ] triggered by Bot

May 04 '25 16:05 tensorrt-cicd

PR_Github #4038 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #2868 completed with status: 'FAILURE'

May 04 '25 23:05 tensorrt-cicd