sorasoras
sorasoras
我看到 这里有个 SS-android的 被魔改成 可以用NDK编译出 armv8 https://github.com/wongsyrone/shadowsocks-android/releases 我在想 这个 SSR-android 像上面的那样 也支持ARMV8?
@wangyu- 好像新的UDP2RAW fix Gro 这个选项跟 UDPSpeeder 有冲突 没法同时使用
### Feature request recently, https://github.com/ggerganov/llama.cpp has add support for both QWEN and Baichuan2. It has added QWEN at 1610. https://github.com/ggerganov/llama.cpp/pull/4281 I have look up the Nomic Vulkan Fork of LLaMa.cpp,...
Are there any plan for support Qwen Model in the future? https://huggingface.co/Qwen It would be great to be able to merge multilingual model like Qwen that come with size from...
latest llama cpp output incoherently compare to Transformers output. transformers/vllm work ok but llama cpp gguf does not
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that...
# Feature Description with KV cache quantized in 2bits. This brings 2.6× less peak memory on the Llama/Mistral/Falcon models we evaluated while enabling 4x larger batch size, resulting in 2.35×...
> The experimental results under different lengths demonstrate that BurstAttention offers significant advantages for processing long sequences compared with these competitive baselines, especially tensor parallelism (Megatron-V3) with FlashAttention, reducing 40%...
https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html That looks awesome!
ThunderKittens is an embedded domain-specific language (DSL) within CUDA designed to simplify the development of high-performance AI kernels on GPUs. It provides abstractions for working with small tiles (e.g., 16x16)...