fix(rpc): Improve input validation and error handling
Fixes #13067
The rpc-server was vulnerable to Denial of Service attacks via several RPC commands (SET_TENSOR, GRAPH_COMPUTE, etc.). Malformed messages could trigger failed assertions (e.g., invalid ggml_type) or out-of-bounds reads/writes leading to GGML_ABORT calls, crashing the server process.
This PR introduces robust input validation and replaces abort() calls with graceful error handling:
-
Type Validation:
deserialize_tensornow checks if thetensor->typeis within the validGGML_TYPE_COUNTrange before callingggml_new_tensor_4d. Returnsnullptron invalid type. -
Bounds Checks: Replaced
GGML_ABORTinset_tensor,set_tensor_hash, andget_tensorhandlers with error logging and returningfalsewhen data/offset parameters are out of buffer bounds. -
Error Propagation:
-
create_nodenow checks fornullptrreturn values fromdeserialize_tensorand its recursive calls, propagatingnullptrupwards on failure. Usesfindinstead ofatfor safer map access. -
copy_tensornow checks fornullptrfromdeserialize_tensorand sets the response status to failure if deserialization or bounds checks fail. -
graph_computenow checks fornullptrreturn fromcreate_nodeand returns failure status correctly. The final return value now reflects the actual computation status. -
RPC_CMD_GET_ALLOC_SIZEnow checks the return value ofserver.get_alloc_sizein the RPC server loop. If the call fails, return early to close the connection.
-
on my opinion it may affects to perfomance, may be use feature flag (via cmake)?
on my opinion it may affects to perfomance, may be use feature flag (via cmake)?
I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.
Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server. From invalid tensor operations to crashing via deep recursion in create_node which I would like to also fix. I'd like to work on those one change at a time though.
I think multiple critical fixes behind a feature flag would be counterintuitive. Rather build bench tooling (if needed) and iterate on the fixes so there's minimal performance hit.
I believe it would be interesting to see what the performance impact of this change is. I'm new to the project so pointers welcome if there's a test suite available which would show that.
We use llama-bench to test performance
Slightly off-topic but related: I think there's plenty of opportunities for similar improvements in the RPC server.
The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing. This way we can automatically test for security issues when we make RPC changes and even integrate it into the CI.
RPC_CMD_GET_ALLOC_SIZE does not check for errors, and if the call to get_alloc_size fails it will leave the client connected in a bad state:
https://github.com/ggml-org/llama.cpp/blob/604f0a0045ecb1988a32dbdad7ba4984df9c7ecd/ggml/src/ggml-rpc/ggml-rpc.cpp#L1470
Thanks @slaren, added it to this same PR since it falls under the same scope. https://github.com/ggml-org/llama.cpp/pull/13069/commits/e6dd976acf26cc929431bbd93651702fa7e7956c
The best investment of efforts in this direction would be creating a script/job for coverage guided fuzzing.
Sounds good, I can look into that after this PR 👍