Justin Goodwin

Results 33 comments of Justin Goodwin

Looks like we are all hitting this issue at the same time: https://github.com/facebookresearch/hydra/issues/1621

We are adding logging scripts to monitor load, memory, gpu stats, etc via `dstat` and `dcgmi`. I need to start the logging before `srun` and then shut them down afterwords.

FYI, I wrote a wrapper around my `cmd` to run my commands in a `subprocess` before and after the actual command on the rank zero node. This seems to be...

So these logging scripts are not slurm specific, but I only have access to slurm right now. I like the decorator idea. Here is the class I made to wrap...

I'm not too familiar with the commands as I'm was asked to include these logs for someone else interested in them. I think `dstat` can be installed on any linux...

This error does not appear to be related to multiple gpus itself, but rather the use of using the `nn.parallel` functions. If I use the following: ```python def forward(self, input):...

I'll take a look, but just realized I didn't make clear that the function that crashed wasn't using a GPU array. I had already extracted the data from the GPU....

I figure my issues falls into this category. I was hoping I could read `MatlabHDF5File` files incrementally. We have very large MAT files and was hoping I could just do:...

Shoot delving into the issue, this is a reading a matrix of complex numbers

This doesn't work with the code above: ```python >>> instantiate(builds(torch.Generator, zen_wrappers=call_method("manual_seed", 42))) ... TypeError: Error instantiating 'hydra_zen.funcs.zen_processing' : outer() got multiple values for argument 'mthd_name' ```