runpod-python
runpod-python copied to clipboard
Ability to consume `stream` endpoint using iterators
Is your feature request related to a problem? Please describe. Looking at the current example of Llama2 model https://docs.runpod.io/reference/llama2-13b-chat#streaming-token-outputs
...
response = requests.post(url, headers=headers, json=payload)
response_json = json.loads(response.text)
status_url = f"https://api.runpod.ai/v2/llama2-13b-chat/stream/{response_json['id']}"
for i in range(10):
time.sleep(1)
get_status = requests.get(status_url, headers=headers)
print(get_status.text)
...
It suggests that multiple requests need to be made to the stream endpoint to fetch a streaming response.
Describe the solution you'd like The endpoint should support stream iterators. Ex:
s = requests.Session()
with s.get(url, headers=None, stream=True) as resp:
for line in resp.iter_lines():
if line:
print(line)
Describe alternatives you've considered N/A
Additional context
In the context of serverless executions, the response is currently being sent to JOB_STREAM_URL first (https://github.com/runpod/runpod-python/blob/main/runpod/serverless/modules/rp_http.py#L30-L37)