oneAPI-samples icon indicating copy to clipboard operation
oneAPI-samples copied to clipboard

Modin fail on CORE (GEN13 i9)

Open weiseng-yeap opened this issue 2 years ago • 0 comments

Summary

When I try to installed oneAPI base toolkit and test the MODIN sample apps: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Then detected error below: (raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when (raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio. (raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure. (raylet) - The agent is killed by the OS (e.g., out of memory).

Version

oneAPI toolkit version: 2023.2.0

Environment

OS is Linux uBuntu 22.04.2 LTS CPU: 13th Gen Intel(R) Core(TM) i9-13900 RAM: 32GB

Steps to reproduce

Using the conda running the MODIN sample apps that released by oneAPI: https://github.com/oneapi-src/oneAPI-samples/tree/master/AI-and-Analytics/Getting-Started-Samples/IntelModin_GettingStarted

Observed behavior

Detected the raylet fail like below log:

(raylet) [2023-10-10 22:04:54,885 E 21639 21688] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when (raylet) - The version of grpcio doesn't follow Ray's requirement. Agent can segfault with the incorrect grpcio version. Check the grpcio version pip freeze | grep grpcio. (raylet) - The agent failed to start because of unexpected error or port conflict. Read the log cat /tmp/ray/session_latest/dashboard_agent.log. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure. (raylet) - The agent is killed by the OS (e.g., out of memory).

Expected behavior

I tested on XEON is working, but CORE product not working as same setup.

weiseng-yeap avatar Oct 10 '23 11:10 weiseng-yeap