O.T comments

Results 9 comments of

O.T

[New Model]: Jamba (MoE Mamba from AI21)

pretty please

How to install / compile Ceres on the MacBook Pro M1 Max?

@dje-dev any updates?

Memory requirements

@wilson1yan Can you share the shell/bash script for setting up the inference server via vLLM for PyTorch model, FP16? > If using vLLM for inference (PyTorch model, FP16), I believe...

Suggest integrating AGE as a 'Graph Store' in LlamaIndex for AI/RAG applications

Also just commenting to prevent closure of the issue since it is one that I am also tracking!

How to execute blocks?

Restarted, still dont see anything ... (on windows)

Copy Paste into the Cline Chat Window | Context Stops working for long contexts

i dont think ive ever went over the 120k token limit, and it also works once i restart my vscode for some reason...

support for CohereForAI/c4ai-command-r-v01

So does vLLM support it now or not?

flash attention: simplified

- Remind me later!

[RFC]: Implement disaggregated prefilling via KV cache transfer

Is this still going?