Roshani Narasimhan
Roshani Narasimhan
The standalone data loader, sets up the model and data iterator similar to the train_loop of train.py. The data loader iterates through batches of data, to log step time of...
## Fixes / Features - Enable Cloud DNS on Pathways clusters. - Add user job conditionally based on whether headless mode is defined or not. - Print out proxy address...
## Fixes / Features - Remove default values for proxy-server and server images. - Ensure user provides both proxy-server-image and server-image when --use-pathways is set and vice-versa. - Validate that...
1. When local checkpoints are available for restore, alter mesh setup as follows. - Ignore the JAX coordinator provided by XPK and override the JAX coordinator to be the pod...
## Fixes / Features - Makes Clouddns optional for Pathways XPK clusters. (Regular XPK clusters do not use clouddns at all.) - (Found some issues with CloudDNS upgrade, still testing.)...
# Description This PR enables TPU unit tests to also run with Pathways backend. Essentially, we will have two sets of tests - one with McJAX and one with Pathways....
## Fixes / Features This feature only affects Pathways enabled clusters. - Workloads are bottlenecked by the number of CPU nodes on clusters with Pathways enabled. - Increasing the default...