GPULlama3.java issues

RMS normalization kernel optimization by fusing the reduction kernel and context mapping kernel

4

The main optimization is done by using all the threads, instead of only the first global thread, to calculate the scaling factor. This avoids thread divergence and the need to...

yrq0208

Wrong download link for Qwen3 (1.7B) - FP16, Qwen3 (4B) - FP16 and Qwen3 (8B) - FP16

Correct links should be: Qwen3 (1.7B) - FP16: https://huggingface.co/ggml-org/Qwen3-1.7B-GGUF/resolve/main/Qwen3-1.7B-f16.gguf Qwen3 (4B) - FP16: https://huggingface.co/ggml-org/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-f16.gguf Qwen3 (8B) - FP16: https://huggingface.co/ggml-org/Qwen3-8B-GGUF/resolve/main/Qwen3-8B-f16.gguf

yrq0208

Copy-in embeddings in reduced precision and handle precision conversion during inference

mikepapadim

[Docs] Refine documentation and README.md

13

- Introduced `gpullama3-architecture.svg` and `gpullama3-architecture-light.svg` diagrams. - Improved README with a simplified model collection reference. - Moved detailed GPU requirements and CLI options to `RUN_DEBUG.md` for clarity.

mikepapadim

[WIP][models] Add support for Google's Gemma3 models

```bash ./llama-tornado --model gemma-3-1b-it-f16.gguf --prompt "who are you" --max-tokens 30 --top-p 0.9 ```

mikepapadim

Add INTEGRATIONS.md showcasing LangChain4j and Quarkus integration ex…

1

…amples

mikepapadim

Add Q4_0 quantization support for all models in TornadoVM path

1

Implement complete Q4_0 quantization support following the same pattern as Q8_0: Core Q4_0 Infrastructure: - Add Q4_0TornadoTensor for GPU tensor representation with 4-bit quantization - Implement Q4_0LayerPlanner base class for...

mikepapadim

Rest API support

mikepapadim

Add GUI Chatbox for GPULlama3.java Inference

7

This PR adds a new JavaFX GUI for running inference with `GPULlama3` (for issue #24). It adds a new package `com.example.gui` containing all the new classes for the chatbox GUI,...

svntax

Tooling

TornadoTaskRuntimeException when using Phi-3-mini-4k-instruct-fp16.gguf during TornadoVM initialization

5

**Describe the bug** Tokenizer: Phi3Tokenizer Loading model weights in TornadoVM format (loading F16) Starting TornadoVM initialization... TornadoVM GPU execution plan creation: 619.22 ms Java to GPU JIT compiler warmup: 6147.02...

yrq0208

bug

GPULlama3.java
GPULlama3.java copied to clipboard

Metadata

RMS normalization kernel optimization by fusing the reduction kernel and context mapping kernel

Wrong download link for Qwen3 (1.7B) - FP16, Qwen3 (4B) - FP16 and Qwen3 (8B) - FP16

Copy-in embeddings in reduced precision and handle precision conversion during inference

[Docs] Refine documentation and README.md

[WIP][models] Add support for Google's Gemma3 models

Add INTEGRATIONS.md showcasing LangChain4j and Quarkus integration ex…

Add Q4_0 quantization support for all models in TornadoVM path

Rest API support

Add GUI Chatbox for GPULlama3.java Inference

TornadoTaskRuntimeException when using Phi-3-mini-4k-instruct-fp16.gguf during TornadoVM initialization

← Metadata

Owner

Metadata

GPULlama3.java GPULlama3.java copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPULlama3.java
GPULlama3.java copied to clipboard