Results 2 issues of DY Yao

Thanks for your great work! Could you kindly advise on how to support the models in the LLaMA series?

您好,请问您尝试过更长的文本吗?比如128k。 当我尝试128k的上下文时,将一直停留在prefill阶段的kmeans阶段,这是否超过了cpu负载? 如果您可以提供相关的解决方案,我将不胜感激!