orbit icon indicating copy to clipboard operation
orbit copied to clipboard

Prediction is not thread-safe

Open knkski opened this issue 3 years ago • 4 comments

Describe the bug I have a flask app that has an endpoint that looks something like this, where model is FullBayesianForecaster:

@app.route("/predict", methods=["POST"])
def predict():
    df = get_df()
    result = model.predict(df, seed=1234)
    return result.to_dict(orient="records")

I tested it locally with curl and it worked great: the flask app returned the same predictions for the same input. I then deployed it to a test environment, and another service that talks to it immediately started getting non-deterministic results for requests with the same parameters. I checked the prediction results with another curl request to the test environment, and got the same result back every time, as expected.

After scratching my head a bit, I tried using threading.Lock like this:

with lock:
    result = model.predict(df, seed=1234)

And then the non-determinism went away. This leads me to believe that model prediction is not thread safe. I haven't yet dug down far enough to know for sure what's causing the issue, but this seems to be at least one likely culprit:

https://github.com/uber/orbit/blob/27371ec/orbit/forecaster/full_bayes.py#L96-L97

If multiple threads are calling model.predict such that they fight over the value of self._prediction_meta here, that seems likely to cause predictions to be not thread-safe:

https://github.com/uber/orbit/blob/c232980/orbit/forecaster/forecaster.py#L389

To Reproduce

Try running model.predict with different parameters in the presence of multiple threads, and set seed=a_fixed_number.

Expected behavior

Calling model.predict with seed=a_fixed_number in the presence of multiple threads returns deterministic predictions.

Screenshots

N/A

Environment (please complete the following information):

  • OS: Ubuntu
  • Python Version: 3.9
  • Versions of Major Dependencies (pandas, scikit-learn, cython): [e.g. pandas==1.3.5, scikit-learn==<not installed>, cython==0.29.30]

Additional context

N/A

knkski avatar Aug 16 '22 18:08 knkski

@knkski are you saying you get same result across threads even with the setting seed=None ?

edwinnglabs avatar Aug 19 '22 21:08 edwinnglabs

@edwinnglabs I haven't tested that, I'd assume that I'd get different results each time then. This issue arose after setting the random seed, assuming that would fix the non-deterministic predictions I was seeing. Since I started getting deterministic predictions with a thread lock, I didn't try seeing what happens with a thread lock, but without a random seed.

knkski avatar Aug 19 '22 22:08 knkski

So you want to see same behavior on each thread you call model.predict? If so, the other around as you described should work by using seed=a_fixed_number. Please let me know if I understand the problem correctly...

edwinnglabs avatar Aug 19 '22 23:08 edwinnglabs

Yeah, I'm looking to get deterministic predictions on every thread, given that I pass in seed=a_fixed_number. Right now I'm not seeing that; using multiple threads will return non-deterministic predictions, even if I set seed=a_fixed_number, and that's what I think the bug is.

knkski avatar Aug 22 '22 16:08 knkski