Shaowei Su
Shaowei Su
**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] Based on the [examples](https://github.com/bytedance/byteps/tree/master/example/tensorflow)...
## Feature Request If this is a feature request, please fill out the following form in full: ### Describe the problem the feature is intended to solve Currently TF Serving...
**What this PR does / why we need it**: Report bug when non readOnly volume mounts exists in the Trial Job template and add two unit tests to reproduce it....
Follow up on issue: https://github.com/mopemope/meinheld/issues/92 Can we disable the asyncIO feature from Meinheld conditionally? e.g by simply removing `patch.patch_all()`?
hi Komiya, This is a very interesting project you've built! I have one question: is it possible to load model built from "ml.dmlc.xgboost4j.scala.{XGBoost, DMatrix}" libraries with the predictor-spark package? I...
/kind bug **What steps did you take and what happened:** [A clear and concise description of what the bug is.] Katib experiment detail page can be accessed after a given...
**Describe the bug** A clear and concise description of what the bug is. DeepSpeed training with staging 3 led to job hanging randomly with empty GPU usage on certain workers:...
### System Info ```shell optimum==1.14.1 peft==0.6.1 pytorch-lightning==2.1.0 pytorch-pretrained-bert==0.6.2 torch==2.0.0+cu118 torchaudio==2.0.0+cu118 torchmetrics==1.2.0 torchvision==0.15.1+cu118 ``` ### Who can help? _No response_ ### Information - [ ] The official example scripts - [X]...
### Environment: ``` tensorflow==2.8.0 tensorflow-io==0.25.0 ``` S3 loading client: [tf.data.TFRecordDataset](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset). ### Issue By default, S3 has limit on the number of GET/HEAD operation up to 5,500 per second per partitioned...