FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

cuda增加,直到溢出报错

Open sevenandseven opened this issue 1 year ago • 10 comments

Hello, I am using the officially provided method of loading the reranker to perform similarity calculations. During the calculation process, I found that after the cache stabilizes for a period of time, it gradually increases until there is not enough video memory left. How can I solve this problem? I tried using (torch.cuda.empty_cache())this method, but it didn’t work very well, and there was still not enough video memory.

sevenandseven avatar May 15 '24 07:05 sevenandseven

@sevenandseven , which reranker do you use?

staoxiao avatar May 15 '24 17:05 staoxiao

@sevenandseven , which reranker do you use?

bge-reranker-large、bge-reranker-base、bge-reranker-v2-m3、bge-reranker-v2-gemma、bge-reranker-v2-minicpm-layerwise。 The above models all exhibit this behavior.

sevenandseven avatar May 16 '24 01:05 sevenandseven

You can reduce the batch size and max_length to reduce memory cost.

staoxiao avatar May 16 '24 10:05 staoxiao

You can reduce the batch size and max_length to reduce memory cost.

I encountered this situation while inference, without the above parameters.

sevenandseven avatar May 16 '24 10:05 sevenandseven

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

staoxiao avatar May 16 '24 10:05 staoxiao

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

ok,thanks。

sevenandseven avatar May 16 '24 10:05 sevenandseven

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch sizemax_length传递给 compute_score(batch size=?, max_length=?) 函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗

EvanSong77 avatar Sep 19 '24 08:09 EvanSong77

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) sizemax_length传递给 compute_score(batch size=?, max_length=?) 函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗

已解决

EvanSong77 avatar Sep 19 '24 11:09 EvanSong77

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) sizemax_length传递给 compute_score(batch size=?, max_length=?) 函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗

已解决 您好,请问是怎么解决的呢

jazzisfuture avatar Sep 24 '24 02:09 jazzisfuture

@sevenandseven , you can pass batch size and max_length to compute_score(batch size=?, max_length=?) function: [https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194,您可以将batch](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194%EF%BC%8C%E6%82%A8%E5%8F%AF%E4%BB%A5%E5%B0%86%60batch) sizemax_length传递给 compute_score(batch size=?, max_length=?) 函数: https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/flag_reranker.py#L194

佬 您好,我将reranker转为onnx,部署进行压测,也发现了这个问题,它调用结束之后显存不会释放,请问如何您有好的方法解决这个问题吗

已解决 您好,请问是怎么解决的呢

https://github.com/microsoft/onnxruntime/issues/19445 设置一下清除策略

EvanSong77 avatar Sep 25 '24 06:09 EvanSong77