Pengcheng Young issues

Results 7 issues of


                                            Pengcheng Young

Anomaly occurs early in window only C-2

It seems like `if not channel.id == 'C-2'`, [code here](https://github.com/khundman/telemanom/blob/master/telemanom/errors.py#L62), should be `if channel.id == 'C-2'`. Is this a mistake? Thanks.

How to compose TensorBoard into pipeline during training

I have tried MLBlocks, MLPrimitives, Orion to train neural network models, but it seems like there is no way to add some custom operations, such as TensorBoard. BTW, I'm curious...

Dose data_prefetcher() really speed up training?

I used your Python code https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py#L256 My code is - https://github.com/zhangpzh/maskrcnn-benchmark/blob/Falcon/tools/train_net.py - replace this [for-each-loop](https://github.com/zhangpzh/maskrcnn-benchmark/blob/Falcon/maskrcnn_benchmark/engine/trainer.py#L57) by your `while True` - pytorch: torch.nn.parallel.DistributedDataParallel - 8 GPUs I think this `data_prefetcher` could...

[Ubuntu18.04] failed to install, gcc error

## Summary * OS: Ubuntu 18.04 * Architecture: ARM64 * Python version: python 3.9 * Type: installation ## Description I have tried both 5.9.4 and 5.9.3. gcc is 7.5.0, need...

bug

linux

Update experiment instance status failed: the object has been modified

/kind bug **What steps did you take and what happened:** I got error when update experiment status in experiment controller. ``` {"level":"info","ts":"2024-03-04T01:39:38Z","logger":"experiment-controller","msg":"Update experiment instance status failed, reconciler requeued","Experiment":{"name":"a10702550312415232282375","namespace":"heros-user"},"err":"Operation cannot be...

kind/bug

GPU fragmentation across nodes and Job/Pod rescheduling strategy request

## Background In the Volcano Scheduler, the `binpack` plugin can be configured to maximize the resource usage of individual nodes (i.e., assigning jobs to fully utilize a node before allocating...

kind/feature

area/scheduling

kind/question

Duplicate hyperparameters waste compute and time

### What happened? I am experiencing a recurring issue where the hyperparameter tuning process generates duplicate sets of parameters, leading to inefficient use of GPU resources. For instance, with the...

kind/bug

lifecycle/needs-triage