sorasoras

Results 23 issues of sorasoras

我看到 这里有个 SS-android的 被魔改成 可以用NDK编译出 armv8 https://github.com/wongsyrone/shadowsocks-android/releases 我在想 这个 SSR-android 像上面的那样 也支持ARMV8?

@wangyu- 好像新的UDP2RAW fix Gro 这个选项跟 UDPSpeeder 有冲突 没法同时使用

### Feature request recently, https://github.com/ggerganov/llama.cpp has add support for both QWEN and Baichuan2. It has added QWEN at 1610. https://github.com/ggerganov/llama.cpp/pull/4281 I have look up the Nomic Vulkan Fork of LLaMa.cpp,...

backend
models

Are there any plan for support Qwen Model in the future? https://huggingface.co/Qwen It would be great to be able to merge multilingual model like Qwen that come with size from...

latest llama cpp output incoherently compare to Transformers output. transformers/vllm work ok but llama cpp gguf does not

bug-unconfirmed

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that...

bug-unconfirmed

# Feature Description with KV cache quantized in 2bits. This brings 2.6× less peak memory on the Llama/Mistral/Falcon models we evaluated while enabling 4x larger batch size, resulting in 2.35×...

enhancement
stale

> The experimental results under different lengths demonstrate that BurstAttention offers significant advantages for processing long sequences compared with these competitive baselines, especially tensor parallelism (Megatron-V3) with FlashAttention, reducing 40%...

feature request

https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html That looks awesome!

ThunderKittens is an embedded domain-specific language (DSL) within CUDA designed to simplify the development of high-performance AI kernels on GPUs. It provides abstractions for working with small tiles (e.g., 16x16)...