logicchains

Results 2 issues of logicchains

I'm running the 65B model on a machine with 256 gigabytes of (CPU) ram, with context size set to 2048. The same thing happens with both llama65b and alpaca65b, every...

bug

I was training a llama model on GPU, with a custom embedding. It worked fine with 12 layers, dim 1024, seq length 256, but loss would become nan after the...

bug