SwiftInfer
SwiftInfer copied to clipboard
Efficient AI Inference & Serving
Results
3
SwiftInfer issues
Sort by
recently updated
recently updated
newest added
Hello, I would like to ask, based on qwen1.5-32k originally supports 32k, if I train it, will the input length become weaker? Is it okay to use yours? Does it...
Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that...