cuda增加,直到溢出报错
Hello, I am using the officially provided method of loading the reranker to perform similarity calculations. During the calculation process, I found that after the cache stabilizes for a period of time, it gradually increases until there is not enough video memory left. How can I solve this problem? I tried using (torch.cuda.empty_cache())this method, but it didn’t work very well, and there was still not enough video memory.
@sevenandseven , which reranker do you use?
@sevenandseven , which reranker do you use?
bge-reranker-large、bge-reranker-base、bge-reranker-v2-m3、bge-reranker-v2-gemma、bge-reranker-v2-minicpm-layerwise。 The above models all exhibit this behavior.
You can reduce the batch size and max_length to reduce memory cost.
You can reduce the
batch sizeandmax_lengthto reduce memory cost.
I encountered this situation while inference, without the above parameters.
@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194
@sevenandseven , you can pass
batch sizeandmax_lengthtocompute_score(batch size=?, max_length=?)function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194
ok,thanks。
@sevenandseven , you can pass
batch sizeandmax_lengthtocompute_score(batch size=?, max_length=?)function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch size和max_length传递给compute_score(batch size=?, max_length=?)函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194
佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗
@sevenandseven , you can pass
batch sizeandmax_lengthtocompute_score(batch size=?, max_length=?)function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) size和max_length传递给compute_score(batch size=?, max_length=?)函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗
已解决
@sevenandseven , you can pass
batch sizeandmax_lengthtocompute_score(batch size=?, max_length=?)function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) size和max_length传递给compute_score(batch size=?, max_length=?)函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗
已解决 您好,请问是怎么解决的呢
@sevenandseven , you can pass
batch sizeandmax_lengthtocompute_score(batch size=?, max_length=?)function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) size和max_length传递给compute_score(batch size=?, max_length=?)函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗
已解决 您好,请问是怎么解决的呢
https://github.com/microsoft/onnxruntime/issues/19445 设置一下清除策略