RedmiS22018

Results 2 issues of RedmiS22018

Instead of needing to load the weights into memory compare every byte in the LLaMA & Delta files and add the delta to the bytes, loading 4KB at a time,...

Instead of using parameter deltas this implementation compares each byte in the delta and in the LLaMA model and outputs the vicuna model. This offers significntly less RAM usage compared...