numaflow-python icon indicating copy to clipboard operation
numaflow-python copied to clipboard

User Defined Async Source - "Readiness probe failed" when there are no more messages

Open tolmanam opened this issue 2 years ago • 4 comments

Description

This is probably just me not understanding how things are supposed to work.

I have created a user-defined source, based on the async source example that sets up a REST API to accept requests that execute database queries and generate Numaflow messages for a pipeline to work off.

I am not sure what the read_handler function should return when there aren't any results to pass on (this could be just because we are waiting for another REST request).

I tried just breaking out of the iterator but that resulted in a "Readiness probe" failure so K8s will restart the pod.

To Reproduce

Steps to reproduce the behavior:

  1. Modify the async-source example.py so that the read_handler returns after some number of messages, rather than running forever.

Quick and dirty:

From:

for x in range(datum.num_records):

To:

for x in range(self.read_idx, datum.num_records):
  1. Build the image
  2. Deploy the pipeline
  3. Monitor the deployment (k9s)

Expected behavior

I thought that the source would stop producing messages so the pipeline would flush all the queues and then wait for more work (which will never come in this test case, but could in the REST API scenario described above).

Environment

  • Kubernetes: v1.27.6+k3s1
  • Numaflow: quay.io/numaproj/numaflow:v1.1.1
  • Numalogic: unknown (please advise where I might find this information)
  • Numaflow-python: 0.6.0

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

tolmanam avatar Dec 27 '23 16:12 tolmanam

Is the expected behavior for the read_handler to run, forever, and just block while there is no data to pass along? I always worry about waiting for things indefinitely.

tolmanam avatar Dec 27 '23 16:12 tolmanam

FWIW -

I also see this same "Readiness probe failed" if the read_handler takes too long to respond.

Rather than limiting the number of responses as described above, you can just add a long sleep (longer than the readiness probe) inside the loop.

tolmanam avatar Dec 27 '23 17:12 tolmanam

Hey @tolmanam I was trying to replicate the issue with the steps you provided and I had a quick question, Were you seeing a pipeline deletion due to pods autoscaling down to 0 because of no traffic or was a crash seen at your end?

kohlisid avatar Jan 12 '24 19:01 kohlisid

I believe it was Kubernetes killing the pod because it failed the "Readiness probe".

Consider the use case that you want to run a database query that generates X number of messages every 10 minutes. You wouldn't want autoscaling to drop the vertex.

FWIW - I swapped out the UDF source with the built-in HTTP source, and it runs happily without adding any messages to the pipeline until receiving a POST, so the behavior I would like is compatible with Numaflow, I just don't appear to know how to build a User Defined Source.

tolmanam avatar Jan 14 '24 09:01 tolmanam