Whisht
Whisht
Original bucket sort implement simply divide bucket with `size=1`,so all the elements are put into the first bucket. It is no difference between `sorted` function. I added a variable `bucket_size`...
Not yet, I have checked the `non_max_suppression` function and other called function by `output_to_target()`. It seems no problem about the model part and the use of `inf_out`. But I found...
Hi, @Hellebore. 1. The chat assistant is not always working normally. Sometimes I have to restart Pycharm, the chat assistant start to respond. And Soucery is stuck at the scanning...
Here is a screen record in which I only have two operations - close and open the tabs. You will see that the entries are suddenly disappeared. ![CleanShot 2024-04-22 at...
Sorry, I am not familiar with Wassermann's way. I thought $dF(x)$ is differential of $F(x)$, i.e. $dF(x)=f(x)dx$. So the notation you used here makes me think $\mathbb{E}[XY] = \int_{X,Y} xy...
Add the **Negative Entropy** of `sim_i2t_m` to the total loss by multiplying the coefficient $\alpha$. You will get the KL divergence of $q$ and $p$. Some equations for reference: $$...
I have the same question for Chinese evaluation too #465 . But after scanning the documents of [`rouge-score`,](https://github.com/google-research/google-research/tree/master/rouge) I am afraid that I have to turn to another implementation for...
Thanks for your reply. Are there other ways that I can run Qwen2.5 with trtlllm in V100 GPU?
Hi, the recent path @mjang2000 updateing cannot use now. Could you provide a new solution? Thanks.