Pengcheng Young
Pengcheng Young
It seems like `if not channel.id == 'C-2'`, [code here](https://github.com/khundman/telemanom/blob/master/telemanom/errors.py#L62), should be `if channel.id == 'C-2'`. Is this a mistake? Thanks.
I have tried MLBlocks, MLPrimitives, Orion to train neural network models, but it seems like there is no way to add some custom operations, such as TensorBoard. BTW, I'm curious...
I used your Python code https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py#L256 My code is - https://github.com/zhangpzh/maskrcnn-benchmark/blob/Falcon/tools/train_net.py - replace this [for-each-loop](https://github.com/zhangpzh/maskrcnn-benchmark/blob/Falcon/maskrcnn_benchmark/engine/trainer.py#L57) by your `while True` - pytorch: torch.nn.parallel.DistributedDataParallel - 8 GPUs I think this `data_prefetcher` could...
## Summary * OS: Ubuntu 18.04 * Architecture: ARM64 * Python version: python 3.9 * Type: installation ## Description I have tried both 5.9.4 and 5.9.3. gcc is 7.5.0, need...
/kind bug **What steps did you take and what happened:** I got error when update experiment status in experiment controller. ``` {"level":"info","ts":"2024-03-04T01:39:38Z","logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":{"name":"a10702550312415232282375","namespace":"heros-user"},"err":"Operation cannot be...
## Background In the Volcano Scheduler, the `binpack` plugin can be configured to maximize the resource usage of individual nodes (i.e., assigning jobs to fully utilize a node before allocating...
### What happened? I am experiencing a recurring issue where the hyperparameter tuning process generates duplicate sets of parameters, leading to inefficient use of GPU resources. For instance, with the...