Tai Duc Nguyen

Results 9 comments of Tai Duc Nguyen

I am seeing extreme slowdown with MatthewsCorrCoef too. What used to take less than a second for me now takes 10 minutes! Reverting back to 0.9.0 or 0.8.2 works just...

Hey guys, here's a PR I made to do this: https://github.com/ggerganov/llama.cpp/pull/403. Please check it out. If you have any questions, don't hesitate to ask here.

> I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb > > Is it possible for 4bit...

Well, I suppose they quantize the weights to 4bit then save it as 4bit, which you can do easily with a bit of modification on my code. However, at inference,...

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights...

@anzz1 @ggerganov Any idea how I can get this PR reviewed/accepted? I am willing to put in more work to make it run correctly and smoothly.

> @ggerganov any reason why this was removed from main? I think it's because some time ago there were lots and lots of breaking changes to the implementation that the...

I was able to modify the mcp.py file and it's working in my tests. I also added schema validation and error handling for invalid input parameters. Let me know if...

> Also, btw since in LitServe the decode_request argument is bound to be called `request` - the MCP properties must be request. I think you are a bit confused here...