ZeroYuJie comments

Results 12 comments of


                                            ZeroYuJie

Can Models with Different vocab_size be Merged?

> Glad you've found it useful! > > In principle `tokenizer_source: union` should be doing what you want here. It is a pretty experimental feature and I wouldn't be surprised...

Can Models with Different vocab_size be Merged?

> I was doing the exact same merge, ended up using stabilityai/japanese-stablelm-base-gamma-7b > > I wanted shisa for the strong Japanese language ability, and OpenHermes for the natural language of...

panic:fatal error: concurrent map writes

@sugarme I am using this in my multi-goroutine testing. first i use this func to init model tokenizer, then I initialized a tokenizer within a global variable. the code like...

vLLM support?

I've been using the Orion branch from https://github.com/dachengai/vllm and it's running, but there might be issues with outputs in different languages

Memory leak from long-duration inference

I encountered a similar issue while using the `NousResearch/Redmond-Puffin-13B ` model, version v1.0.1. During testing with actual concurrent generations, GPU memory usage gradually increases until it reaches a point where...

Memory leak from long-duration inference

@Narsil CUDA Version: 12.2 + Centos7 and running in docker

RAM issue

I have also encountered this problem, and I used [efficiency-nodes-comfyui](https://github.com/jags111/efficiency-nodes-comfyui) to load model,I don't know if the two are related.

【Frontend】Add sampler_priority and repetition_penalty_range

Could someone help me review this PR?

【Frontend】Add sampler_priority and repetition_penalty_range

``` [2024-11-04T07:22:51Z] =================================== FAILURES =================================== -- | [2024-11-04T07:22:51Z] __________________________ test_gpu_memory_profiling ___________________________ | [2024-11-04T07:22:51Z] | [2024-11-04T07:22:51Z] def test_gpu_memory_profiling(): | [2024-11-04T07:22:51Z] # Tests the gpu profiling that happens in order to determine...

Missing Granular Metrics for upstream_addr and upstream_response_time

I also believe that corresponding monitoring should be added in the dimension of the target, such as the number of requests to the target. I don't quite understand why, in...