nnpdf icon indicating copy to clipboard operation
nnpdf copied to clipboard

Excessive memory consumption in hyperopt scan

Open scarlehoff opened this issue 1 year ago • 3 comments

When running a hyperparameter scan, the memory grows proportional to the number of trials.

This smells like a memory leak, since from one trial to the next virtually all information can be safely dropped. In principle there is a call to clear_backend between hyperopts but clearly not all memory is emptied. This should be fixed.

@Cmurilochem @goord I understand you don't have more time in the project to debug this issue, so I'll do it at some point, but have you by any chance already looked into this (and maybe know where the memory leak is coming from). It will save me some time.

edit: some more info, I can get away with a parallel run of ~60-100 replicas with ~20 GB of RAM. Running a hyperparameter scan should, in the worst-case-scenario, take nfolds X ~20 GB. Even that should be reduced since there's no reason to keep around the NN weights and gradients after the fold has finished.

scarlehoff avatar Sep 09 '24 06:09 scarlehoff

I remember we discussed about memory leaks in connection with tensorflow versions a while ago. But I guess that this is another thing and honestly I have never looked at this issue.

Cmurilochem avatar Sep 09 '24 11:09 Cmurilochem

Are you running on GPU or CPU @scarlehoff ?

goord avatar Sep 12 '24 08:09 goord

GPU.

scarlehoff avatar Sep 12 '24 08:09 scarlehoff