ext_proc: Refactor the management of sidestream response in stream mode.
Huge response(either single large response or large amount of smaller responses in short period) from ext_proc server could lead to OOM risk.
A simple and safe solution proposed here : Leveraging the HCM buffering/watermark to handle the ext_proc server response. When the response cause high watermark, local reply will be triggered to avoid OOM. Note: the request to ext_proc server is still being streamed out.
The potential optimal solution and next step : upstream/downstream applies back pressure to sidestream(that connects to ext_proc server). It is being actively explored and developed but it is a complex solution demands significant effort/test.
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
/assign @htuch @yanavlasov
PTAL, Thanks!
As discussed last week, I'm a little worried about the lack of predictability of errors with this solution. I like the fact that this approach protects the Envoy, in particular in a multi-tenant scenario. But, basically arbitrary upstream slowness, which usually would trigger proper flow control, can now cause error codes.
There might be a way to salvage this though - if we can document some strong guarantees, e.g. "if the ext_proc server never sends more than some fixed constant excess bytes, e.g. 10% more bytes to upstream than the client has sent and observed by the ext_proc server" then we can allow ext_proc services to reason about safe mutations that will work within existing flow control expectations and not error out.