Justin Goodwin
Justin Goodwin
Looks like we are all hitting this issue at the same time: https://github.com/facebookresearch/hydra/issues/1621
We are adding logging scripts to monitor load, memory, gpu stats, etc via `dstat` and `dcgmi`. I need to start the logging before `srun` and then shut them down afterwords.
FYI, I wrote a wrapper around my `cmd` to run my commands in a `subprocess` before and after the actual command on the rank zero node. This seems to be...
So these logging scripts are not slurm specific, but I only have access to slurm right now. I like the decorator idea. Here is the class I made to wrap...
I'm not too familiar with the commands as I'm was asked to include these logs for someone else interested in them. I think `dstat` can be installed on any linux...
This error does not appear to be related to multiple gpus itself, but rather the use of using the `nn.parallel` functions. If I use the following: ```python def forward(self, input):...
I'll take a look, but just realized I didn't make clear that the function that crashed wasn't using a GPU array. I had already extracted the data from the GPU....
I figure my issues falls into this category. I was hoping I could read `MatlabHDF5File` files incrementally. We have very large MAT files and was hoping I could just do:...
Shoot delving into the issue, this is a reading a matrix of complex numbers
This doesn't work with the code above: ```python >>> instantiate(builds(torch.Generator, zen_wrappers=call_method("manual_seed", 42))) ... TypeError: Error instantiating 'hydra_zen.funcs.zen_processing' : outer() got multiple values for argument 'mthd_name' ```