Ax icon indicating copy to clipboard operation
Ax copied to clipboard

AX seems to get stuck with Ray

Open Balandat opened this issue 1 year ago • 1 comments

Discussed in https://github.com/facebook/Ax/discussions/2341

Originally posted by zhqrbitee April 9, 2024 Hi, I'm hitting a weird situation that sometime AX seems to get stuck (e.g. pending for 2 days) and AX is printing out tons of message like below.

[INFO 04-08 23:04:18] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:19] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:20] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:21] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.
my_simvenv/lib/python3.9/site-packages/ax/core/data.py:284: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  return cls(df=pd.concat(dfs, axis=0, sort=True))
[INFO 04-08 23:04:22] ax.modelbridge.torch: The observations are identical to the last set of observations used to fit the model. Skipping model fitting.

Meanwhile, Ray is complaining not get new trails. I'm a bit confused here, as even if AX skip doing model fitting, it should still be able to tell Ray the next point to sample.

2024-04-08 23:04:23,794	WARNING insufficient_resources_manager.py:163 -- Ignore this message if the cluster is autoscaling. No trial is running and no new trial has been started within the last 60 seconds. This could be due to the cluster not having enough resources available. You asked for 1.0 CPUs and 0 GPUs per trial, but the cluster only has 48.0 CPUs and 0 GPUs available. Stop the tuning and adjust the required resources (e.g. via the `ScalingConfig` or `resources_per_trial`, or `num_workers` for rllib), or add more resources to your cluster.

I'm using AX 3.7.0 with Ray 2.8.0. Sadly, I cannot reproduce this with simple example, but any suggestion would be appreciated.

Balandat avatar Apr 13 '24 16:04 Balandat

This should be resolved after https://github.com/facebook/Ax/pull/2318. Can you try 0.4.0 and see if the issue persists?

Cesar-Cardoso avatar May 23 '24 15:05 Cesar-Cardoso

@Cesar-Cardoso sorry for the delay. It seems fixed, so you can close this issue.

zhqrbitee avatar May 31 '24 00:05 zhqrbitee