Leon Lu
Leon Lu
Hi, thanks for the work. Just want to know if there is a rough schedule on these features development?
Similar issue meet for llama3.1, which report the error below: TVMError: Function flashinfer.attention_kernel_prefill_with_ragged_kv_cache_begin_forward(0: DLTensor*, 1: DLTensor*, 2: int64_t, 3: int64_t, 4: int64_t, 5: int64_t, 6: void*) -> void expects 7...
@mengshyu , I do not use APK, just use command line in Ubuntu20.04, and before I upgrade the mlc-llm code base, the same model works.