Matt Wronkiewicz comments

Results 6 comments of


                                            Matt Wronkiewicz

llamacpp support

If it drops kwargs[“n”], it should call llama.cpp multiple times to generate the requested number of completions. Some of the optimizers expect multiple completions. You could potentially speed this up...

support mlx as backend?

MLX generation works with the OpenAI model. Use MLX-LM server: `python -m mod_server --model berkeley-nest/Starling-LM-7B-alpha --port 11434` Then you can call it from DSPy. `llm = dspy.OpenAI(model_type="chat", api_base="http://localhost:11434/v1/")`

fix(colbertv2): Add error handler for server reports error

Yes, thanks for pointing that out. Good to know. I can catch the KeyError in `ColBERTv2.__call__` and raise an exception with a more helpful error message, though the server error...

fix(colbertv2): Add error handler for server reports error

New error message: ``` File "dsp/modules/colbertv2.py", line 34, in __call__ raise KeyError(f"Key not found in ColBERTv2 server response: {e}") from e KeyError: "Key not found in ColBERTv2 server response: 'topk'"...

Optimized DeepSeek V2/V3 implementation (MLA)

@fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes.

Optimized DeepSeek V2/V3 implementation (MLA)

> > @fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes. > > @wronkiew What...