Remo Dentato

Results 16 comments of Remo Dentato

I believe, for example, that `int64_t` is clearer than `long`. However consider that I come from an age when usually `int` was 16 bits and I still have to switch...

Fwiw, I believe it is key to have in the repo a Cuda implementation (and later an OpenCL one and so on). This will allow to focus the efforts on...

I know I'm annoying but this is exactly why I believe it's beneficial to have this version in the repo.

To keep them aligned, I would push the differences to specific functions like "load_weights" etc. If you are not opposed to the idea of creating a llm "object" (like I...

@karpathy , I see your point, that's why I submitted those minimal PR in the hope they can help you moving faster to your desired state. However not having this...

I got the same issue but I only have 16GB of Ram at the moment. I told myself I would have tried with a bigger machine but never did. How...

Ok. I see you went for a much deeper change. Did you manage to test it?

The point is that they can be directly loaded into the GPU. Not needing conversion on-the-flying (and having a smaller file to load) significantly reduce the load time (which, for...

It will be the "legacy version" but with fp16 weights. Because of how `export.py` works, you need to give a version number to it: ``` usage: export.py [-h] [--version VERSION]...

Just thought of another way, but I'm not sure I like it: use the extension of the output file to determine the fp32/fp16 size. For example: ``` python export.py --hf...