Alex Nguyen
Alex Nguyen
Mine is iPhone 4s using iOs 9.3.5  Original bootstrap theme http://creative.ondrejsvestka.cz/ worked! 
``` _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | |...
I built codon from source with gpu support and try to run a simple gpu code but got following error. How should i fix it? thanks. ``` $ ./build/codon run...
https://github.com/karpathy/nanoGPT/blob/7f74652843d8cbea31e2a9c986caf4a0ad452a6c/model.py#L136 I'd like to ask the reason why nanoGPT don't try other kind of positional embeddings? What is the advantage of using a learnable position embedding? Thanks.
Following example at https://github.com/togethercomputer/RedPajama-Data/tree/main/data_prep/cc/cc_net#pipeline-overview but got following error. Did I forget to run any preparation? ``` (racoon) t@medu:~/repos/NAM/red-pajama/data_prep/cc/cc_net$ python -m cc_net -l my -l gu usage: __main__.py [-h] [-c CONFIG_NAME]...
i would like to ask above question related to new embedding vectors coresponding to those new tokens. I believe good initialization is important for continual pre-training.
Didn't find a clue to get the datasets, so I ask here. It's not an issue related to the implementation.
Sparse should reduce the size and increase infer speed without hurting perf too much. This repo https://github.com/IST-DASLab/sparsegpt is Apache license and may be useful (I hope)