Lucas Nestler

Results 42 comments of Lucas Nestler

#5 - Mixture of Experts 1) DeepSpeed is fundamentally broken, and we shouldn't get it near our code unless we manually verify their code first. Thus, their claims are not...

#7 - Parameter Offload 1) DeepSpeed is fundamentally broken, and we shouldn't get it near our code unless we manually verify their code first. Thus, their claims are not necessarily...

#12 - Sharpness Aware Minimization SAM significantly outperforms its non-SAM counterparts and allows you to find bugs in your code trivially. For example, below is a run that shifts the...

#13 - Custom Kernel Ignoring that the gradients are slightly wrong, the model is 20% faster and uses 10% less memory when writing both forward and backward pass by hand...

#16 - TorchScript/Trace A compiled language is likely significantly faster. Thus, a switch to JAX would most likely be reasonable to compile both forward and backward pass in one go....

#20 - TorchScript v2 You can script the forward pass, but not the backward or optimiser step. With torch.script.trace, a step takes about 10% less time. Experiments are linked in...

Would [memory-profiler](https://pypi.org/project/memory-profiler/) be a suitable library for this task? It allows thorough benchmarking of memory consumption on CPU by checkpointing the ram usage every couple miliseconds.

In your particular case the model is prone to overfit, so it might not be a good idea to add new cases on the fly (like you'd do in online...

Those are two very interesting scenarios. The first one you mentioned, adding new training data, should be possible. If @alessiosavi can't add the feature right now I'll release a pull...

If you tell the neural network to classify between ten people that works fine. Even if you only give it two people the first day and over time add new...