Dominik Rabij
Dominik Rabij
Started internal discussion about this topic.
kjob is being removed altogether from XPK.
I don't think we have the capacity to implement this and we have other priorities currently :( @FIoannides @kzmyslona
Done in https://github.com/AI-Hypercomputer/xpk/pull/916.
I think this bug is related: http://b/444380558 and http://b/441035007
@pajakd is the PR ready? If not, could it be closed and you'll re-open the PR once it's ready?
http://b/456419254
1. What's the justification for this change? Why are we adding this for every cluster? 2. What's diagon? 3. Could you cover the changes with unit tests please? https://github.com/AI-Hypercomputer/xpk/blob/main/docs/testing.md#unit-test
Also, please merge the current main branch, because the PR changes are mixed with the sub-slicing changes, making it harder to review.
> Hi @jamOne- Moving the installation logic to `managed_ml_diagnostics.py` is causing the unit test to fail. > > ``` > =================================== FAILURES =================================== > __________ test_install_mldiagnostics_prerequisites_commands_executed __________ > > mocks...