Lucas Nestler comments

Results 42 comments of


                                            Lucas Nestler

What we learned

#5 - Mixture of Experts 1) DeepSpeed is fundamentally broken, and we shouldn't get it near our code unless we manually verify their code first. Thus, their claims are not...

What we learned

#7 - Parameter Offload 1) DeepSpeed is fundamentally broken, and we shouldn't get it near our code unless we manually verify their code first. Thus, their claims are not necessarily...

What we learned

#12 - Sharpness Aware Minimization SAM significantly outperforms its non-SAM counterparts and allows you to find bugs in your code trivially. For example, below is a run that shifts the...

What we learned

#13 - Custom Kernel Ignoring that the gradients are slightly wrong, the model is 20% faster and uses 10% less memory when writing both forward and backward pass by hand...

What we learned

#16 - TorchScript/Trace A compiled language is likely significantly faster. Thus, a switch to JAX would most likely be reasonable to compile both forward and backward pass in one go....

What we learned

#20 - TorchScript v2 You can script the forward pass, but not the backward or optimiser step. With torch.script.trace, a step takes about 10% less time. Experiments are linked in...

Better torch CPU RAM measurements

Would [memory-profiler](https://pypi.org/project/memory-profiler/) be a suitable library for this task? It allows thorough benchmarking of memory consumption on CPU by checkpointing the ram usage every couple miliseconds.

Appending user training images when needed

In your particular case the model is prone to overfit, so it might not be a good idea to add new cases on the fly (like you'd do in online...

Appending user training images when needed

Those are two very interesting scenarios. The first one you mentioned, adding new training data, should be possible. If @alessiosavi can't add the feature right now I'll release a pull...

Appending user training images when needed

If you tell the neural network to classify between ten people that works fine. Even if you only give it two people the first day and over time add new...