Remo Dentato
Remo Dentato
I believe, for example, that `int64_t` is clearer than `long`. However consider that I come from an age when usually `int` was 16 bits and I still have to switch...
Fwiw, I believe it is key to have in the repo a Cuda implementation (and later an OpenCL one and so on). This will allow to focus the efforts on...
I know I'm annoying but this is exactly why I believe it's beneficial to have this version in the repo.
To keep them aligned, I would push the differences to specific functions like "load_weights" etc. If you are not opposed to the idea of creating a llm "object" (like I...
@karpathy , I see your point, that's why I submitted those minimal PR in the hope they can help you moving faster to your desired state. However not having this...
I got the same issue but I only have 16GB of Ram at the moment. I told myself I would have tried with a bigger machine but never did. How...
Ok. I see you went for a much deeper change. Did you manage to test it?
The point is that they can be directly loaded into the GPU. Not needing conversion on-the-flying (and having a smaller file to load) significantly reduce the load time (which, for...
It will be the "legacy version" but with fp16 weights. Because of how `export.py` works, you need to give a version number to it: ``` usage: export.py [-h] [--version VERSION]...
Just thought of another way, but I'm not sure I like it: use the extension of the output file to determine the fp32/fp16 size. For example: ``` python export.py --hf...