Mitchell Gordon
Mitchell Gordon
The "using" keyword for specifying a specific branch for a summary was never implemented, correct?
Also random, but did you work on Bliss at Uber? I think I might have been an intern on your team lol.
Sure! I'm planning on testing out the indexing as soon as the embedding is finished, so I can run some benchmarks on auto-faiss in parallel. It's actually been so many...
Yeah I'm at Latitude. It's not a priority project, but I've used my last 3 hackathons to work on it lol
Thanks Tim! Looking forward to future releases. Feel free to close or leave open, whichever seems more appropriate.
Hi Younes! That did decrease the latency, but it is still around 6.1s which is still almost double the latency without int8.
Thanks for the update, Tim! I'm now seeing around 3.1s without quantization, 9.3s with `load_in_8bit=True`, and 5.7s with `load_in_8bit=True,int8_threshold=0`. So definitely better, but still room for improvement. (Compare with 12s...
Another reason to add a release is that Github pages is still pinned to v2.5.1, which was released over a year ago.
+1 to the problem @thomasahle is describing. I am also seeing it on gemini-1.0-pro. And +1 to @meditans , the root of the problem is that special tokens for conversational...
Regardless of whether we do meta prompting or not, we will need to update the [LM interface](https://github.com/stanfordnlp/dspy/blob/0e7eb34e205be18b49c17dcfe5837ca46953417f/dsp/modules/lm.py#L96) and [template class](https://github.com/stanfordnlp/dspy/blob/0e7eb34e205be18b49c17dcfe5837ca46953417f/dsp/primitives/predict.py#L76) to support chat formatting as a special case, since most...