Tai Duc Nguyen comments

Results 9 comments of


                                            Tai Duc Nguyen

Performance difference between `v0.9.3` <-> `v0.10.0`

I am seeing extreme slowdown with MatthewsCorrCoef too. What used to take less than a second for me now takes 10 minutes! Reverting back to 0.9.0 or 0.8.2 works just...

Converting GGML Q4_0 back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

Hey guys, here's a PR I made to do this: https://github.com/ggerganov/llama.cpp/pull/403. Please check it out. If you have any questions, don't hesitate to ask here.

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

> I converted a 30b 4bit ggml model https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main back to pytorch (hf), but the resulting file was 65gb instead of about 20gb > > Is it possible for 4bit...

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

Well, I suppose they quantize the weights to 4bit then save it as 4bit, which you can do easily with a bit of modification on my code. However, at inference,...

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

@anzz1 Thank you for your comment. However, what if you want to study the effect of finetuning on quantized models? Or simply want to look at the distribution of weights...

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

@anzz1 @ggerganov Any idea how I can get this PR reviewed/accepted? I am willing to put in more work to make it run correctly and smoothly.

Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning

> @ggerganov any reason why this was removed from main? I think it's because some time ago there were lots and lots of breaking changes to the implementation that the...

LitServe MCP Bug: tools/call Always Fails

I was able to modify the mcp.py file and it's working in my tests. I also added schema validation and error handling for invalid input parameters. Let me know if...

LitServe MCP Bug: tools/call Always Fails

> Also, btw since in LitServe the decode_request argument is bound to be called `request` - the MCP properties must be request. I think you are a bit confused here...