jing xu issues

Results 5 issues of


                                            jing xu

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'.

Why is the number of tokens in the prompt's output different from the actual number of tokens produced by the tokenizer? I used the LLaMA 2 tokenizer, and the prompt...

GPU memory usage differs from local.

I tried to compare a specific model (such as llama 3B) between Web-LLM and local (MLC-LLM) environments, and found that under the same parameters, i.e., without making any changes, the...

When I use the web-llm instance (path: /web-llm/examples/simple-chat), and observe the source file (@mlc-ai/web-llm/lib/index.js), I notice that there is a lot of interaction with wasm files, which makes reading the...

Prefix of source functions in perf analysis of JS.

When using the perf tool to analyze JavaScript application performance, the source functions in the generated flame graph may have the following prefix identifiers: JS:*, JS:+, and JS:^. What do...

jing xu

arm binaries

In the Llama-2-7b-chat-hf-q4f32_1-1k model, the number of tokens in the prefill is 36 when inputting 'hello'.

GPU memory usage differs from local.

wasm optimization？

Prefix of source functions in perf analysis of JS.