httpx/httpcore ReadTimeouts in replicate.async_run
Hi! I am suddenly seeing a lot of readtimeouts - thought initially that it might have been a temporary issue on replicate side, but they seem to persist. Maybe similar to https://github.com/replicate/replicate-python/issues/345, but no further info there.
It doesn't seem to behave deterministically. E.g. for my last predictions, 8 out of 10 images were correctly downloaded (all were created). Example ReadTimeout exception:
Traceback (most recent call last):
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
yield
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
resp = await self._pool.handle_async_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
raise exc from None
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 196, in handle_async_request
response = await connection.handle_async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
return await self._connection.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 143, in handle_async_request
raise exc
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 113, in handle_async_request
) = await self._receive_response_headers(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 186, in _receive_response_headers
event = await self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_async/http11.py", line 224, in _receive_event
data = await self._network_stream.read(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 32, in read
with map_exceptions(exc_map):
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "<REDACTED>/venv/lib/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ReadTimeout
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<REDACTED>/src/util/api/replicate_api.py", line 78, in generate_images
result = await task
^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 189, in async_run
return await async_run(
^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/run.py", line 96, in async_run
prediction = await client.predictions.async_create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/prediction.py", line 586, in async_create
resp = await self._client._async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 94, in _async_request
resp = await self._async_client.request(method, path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1585, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
response = await self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
response = await self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
response = await self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
response = await transport.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/replicate/client.py", line 319, in handle_async_request
response = await self._wrapped_transport.handle_async_request(request) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 376, in handle_async_request
with map_httpcore_exceptions():
File "/opt/homebrew/Cellar/[email protected]/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "<REDACTED>/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ReadTimeout
Wrapping it now in retries, which seems to help, but still rather very dissatisfactory.
Hi @nicoluca could you give me some additional information about how you're using the library, it looks like you're using replicate.run() are you providing any additional arguments besides the model and inputs?
If possible, which model are you using?
Hi @aron,
Example call for a ComfyUI workflow:
replicate.async_run(
model,
input={
"output_format": "png",
"output_quality": 100,
"randomise_seeds": True,
"workflow_json": json.dumps(comfyui_json_dict)
}
)
Example call for nightmareai/real-esrgan:f121d640bd286e1fdc67f9799164c1d5be36ff74576ee11c803ae5b665dd46aa:
replicate.async_run(
model,
input={
"image": open(image_path, "rb"),
"scale": 2,
"face_enhance": False
}
)
For the latter call I'm also seeing ReadErrors and 502s - but it appears only to happen when I initiate too many concurrently (e.g. now went through fine for 6 predictions and not at all for ~100).
Thanks that's helpful. And just to be absolutely sure, you are using latest v1.0.3?
Was with v1.0.2 but just tried again with v1.0.3, no difference. My feeling is that it occurs more often when you try more concurrently, e.g. also leading to 5xxs sometimes on the server side. I believe replicate is just using default timeout values?
I'm getting a ton of those as well. I was on 1.0.3, switch to 1.0.4. Still there.
The problem persists in 1.0.7. I believe it's a bug around here. https://github.com/replicate/replicate-python/blob/d2956ff9c3e26ef434bc839cc5c87a50c49dfe20/replicate/prediction.py#L644
This method _create_prediction_timeout shares the same wait parameter with _create_prediction_headers, which enforces the maximum timeout to be 60 seconds(because of the header requirement). If it takes longer than 60s (common in cold booting), we get a httpx.ReadTimeout error.
@aron Could you take some time to look at this?