Rahul D Shetty

Results 9 comments of Rahul D Shetty

langchain team has already built this integration: https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_textgen_inference.html

This could be some issue related to bitsandbytes quantization. There is a whole thread and similar linked issues: https://github.com/TimDettmers/bitsandbytes/issues/538

I've tried the approach suggested by @lukestanley and @loretoparisi and got starcoder.cpp to run on browser. Published a demo project on this: https://github.com/rahuldshetty/starcoder.js I tried with [tiny_starcoder_py ](https://huggingface.co/bigcode/tiny_starcoder_py) model as...

I was getting similar issue then I rolled back the docker image to older version and the model started working. Image where its working: ghcr.io/huggingface/text-generation-inference@sha256:f4e09f01c1dd38bc2e9c9a66e9de1c2e3dc9912c2781440f7ac1eb70f6b1479e Model: tiiuae/falcon-7b-instruct NUM_SHARD: 1 No...

Hello @gitknu, You can find the source code for the playground and other examples over here: https://github.com/rahuldshetty/ggml.js-examples You just need to provide the relative path to the model file in...

Unfortunately its not possible to run the original phi-2 model on the browser with llm.js mainly due to the memory limitation on the WASM engine.

I haven't tested but it might be too buggy (and slow) to run more than 2GB models with llm.js. This directly leverages the CPU w/WASM engine on the browser without...

Could you share more context on what do you mean when you say distributed? At the moment what LLM.js does is, it uses the WebAssembly VM running on the browser...

@Joinhack , if you're using Emscripten then try adding these flags and values during compilation: `-s INITIAL_MEMORY=1000MB -s MAXIMUM_MEMORY=4GB -s STACK_SIZE=11524288 -s ALLOW_MEMORY_GROWTH` This should get around the memory limitations....