Ability to consume `stream` endpoint using iterators

Open ivoreis opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. Looking at the current example of Llama2 model https://docs.runpod.io/reference/llama2-13b-chat#streaming-token-outputs

...
response = requests.post(url, headers=headers, json=payload)
response_json = json.loads(response.text)
status_url = f"https://api.runpod.ai/v2/llama2-13b-chat/stream/{response_json['id']}"


for i in range(10):
  time.sleep(1)
  get_status = requests.get(status_url, headers=headers)
  print(get_status.text) 
...

It suggests that multiple requests need to be made to the stream endpoint to fetch a streaming response.

Describe the solution you'd like The endpoint should support stream iterators. Ex:

    s = requests.Session()

    with s.get(url, headers=None, stream=True) as resp:
        for line in resp.iter_lines():
            if line:
                print(line)

Describe alternatives you've considered N/A

Additional context In the context of serverless executions, the response is currently being sent to JOB_STREAM_URL first (https://github.com/runpod/runpod-python/blob/main/runpod/serverless/modules/rp_http.py#L30-L37)

Nov 06 '23 16:11 ivoreis