Matt Wronkiewicz

Results 6 comments of Matt Wronkiewicz

If it drops kwargs[“n”], it should call llama.cpp multiple times to generate the requested number of completions. Some of the optimizers expect multiple completions. You could potentially speed this up...

MLX generation works with the OpenAI model. Use MLX-LM server: `python -m mod_server --model berkeley-nest/Starling-LM-7B-alpha --port 11434` Then you can call it from DSPy. `llm = dspy.OpenAI(model_type="chat", api_base="http://localhost:11434/v1/")`

Yes, thanks for pointing that out. Good to know. I can catch the KeyError in `ColBERTv2.__call__` and raise an exception with a more helpful error message, though the server error...

New error message: ``` File "dsp/modules/colbertv2.py", line 34, in __call__ raise KeyError(f"Key not found in ColBERTv2 server response: {e}") from e KeyError: "Key not found in ColBERTv2 server response: 'topk'"...

@fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes.

> > @fairydreaming do you have a converted model available or instructions for replicating your setup? I would like to run some benchmarks on these changes. > > @wronkiew What...