ml-engineering
ml-engineering copied to clipboard
Machine Learning Engineering Open Book
`ERROR: Could not find a version that satisfies the requirement github_md_utils (from versions: none) ERROR: No matching distribution found for github_md_utils ` Any ideas? Googling github_md_utils gives 0 hits: https://www.google.com/search?client=safari&rls=en&q=github_md_utils&ie=UTF-8&oe=UTF-8...
Hi, I had a good read of this book! Wondering if we can convert the markdown files to PDF so that we can print it out to read. I would...
Would it be possible to improve the folder structure so it somewhat matches your table of contents? eg: ├── Part 1 │ ├── Topic 1 │ ├── Topic 2 ├──...
Have you read https://arxiv.org/pdf/2402.15627 already? There's a lot of details in the later sections that deal with ML training in practice -- garbage collection, autorestarting, IB over ethernet issues etc.
As discussed on slack, since we are trying to find what the max FLOPs is for each accelerator. I changed warmup to `0`. Without any magic flags on nvidia drivers...
Thanks for introducing the new performance metric! I'd like to contribute results for the GH200 chip. I ran the [quick run](https://github.com/stas00/ml-engineering/tree/master/compute/accelerator/benchmarks#examples-of-usage) in a docker container. If this looks good to...
@stas00 Wondering if you have any tips & tricks for working with performance profiling tools such as `nsys`? Or recommendations for systematically optimizing model architecture and single / multi-node training...
wip - seems to work - not sure if most optimal `./mamf-finder.py --m_range 0 20480 256 --n_range 0 20480 256 --k_range 0 20480 256 --output_file=$(date +"%Y-%m-%d-%H:%M:%S").txt --dtype float8_e8m0fnu`