Chan Chi Kit

Results 4 comments of Chan Chi Kit

Maybe you need to reinstall llama-cpp-python with the following command: > CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir Answer from: https://github.com/abetlen/llama-cpp-python/issues/1285#issuecomment-2007778703

@yukiarimo I don't know much about M1. But in general, you can offload more layers in GPU and lower the context size when initializing the LLama class by setting `n_gpt_layers`...

@yukiarimo If you found a speed-up solution, please let me know. XD

> > > Maybe you need to reinstall llama-cpp-python with the following command: > > > > CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip3 install -U --force-reinstall llama-cpp-python --no-cache-dir > > > > >...