ZeroYuJie

Results 12 comments of ZeroYuJie

> Glad you've found it useful! > > In principle `tokenizer_source: union` should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised...

> I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b > > I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of...

@sugarme I am using this in my multi-goroutine testing. first i use this func to init model tokenizer, then I initialized a tokenizer within a global variable. the code like...

I've been using the Orion branch from https://github.com/dachengai/vllm and it's running, but there might be issues with outputs in different languages

I encountered a similar issue while using the `NousResearch/Redmond-Puffin-13B ` model, version v1.0.1. During testing with actual concurrent generations, GPU memory usage gradually increases until it reaches a point where...

@Narsil CUDA Version: 12.2 + Centos7 and running in docker

I have also encountered this problem, and I used [efficiency-nodes-comfyui](https://github.com/jags111/efficiency-nodes-comfyui) to load model,I don't know if the two are related.

``` [2024-11-04T07:22:51Z] =================================== FAILURES =================================== --   | [2024-11-04T07:22:51Z] __________________________ test_gpu_memory_profiling ___________________________   | [2024-11-04T07:22:51Z]   | [2024-11-04T07:22:51Z] def test_gpu_memory_profiling():   | [2024-11-04T07:22:51Z] # Tests the gpu profiling that happens in order to determine...

I also believe that corresponding monitoring should be added in the dimension of the target, such as the number of requests to the target. I don't quite understand why, in...