SwiftInfer icon indicating copy to clipboard operation
SwiftInfer copied to clipboard

Efficient AI Inference & Serving

Results 3 SwiftInfer issues
Sort by recently updated
recently updated
newest added

Hello, I would like to ask, based on qwen1.5-32k originally supports 32k, if I train it, will the input length become weaker? Is it okay to use yours? Does it...

Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that...