Wouter Zwerink
Wouter Zwerink
> Hello, (1) I tried reducing the number of threads the OMP uses and nothing changed regarding the presence of the error. On the other hand I did update the...
Hi @Blaizzy ! Is this feature still on the radar? We train on cloud instances that somewhat frequently get interrupted. This prevents us from using offline mode, as we can...
@Blaizzy I seem to have missed your question, sorry! The training interruptions are not due to neptune at all! The interruptions are from using spot instances. We train with fault...
@crypdick I also needed this and ended up writing a custom traversal order: ``` from typing import Sequence import numpy as np from ffcv.loader import Loader from ffcv.traversal_order.base import TraversalOrder...
https://github.com/libffcv/ffcv/issues/301#issuecomment-1521627945 Heres my workaround to use WeightedRandomSampler in an ffcv Loader
Thanks for looking into this. I can try to create a minimal example later. From the top of my head there is a couple things we do that may be...
Oh strange! How are you syncing the offline run? We call `run.stop()` followed by a subprocess call to `neptune sync --path {path} --project {project} --offline-only` `path` points to the changed...
I'll take some time tomorrow to try to isolate the issue, thanks again for looking into this
Hi @SiddhantSadangi ! I have a script for you that reproduces the bug on my end: ```python import os import neptune import torch import torch.nn.functional as F from pytorch_lightning import...
@SiddhantSadangi interesting, can't seem to find whats causing this! What python version are you using? I'm on 3.9.18 and latest neptune and lightning. I don't think anyone else is trying...