CHEUI icon indicating copy to clipboard operation
CHEUI copied to clipboard

Problem with preprocessm6A

Open MarioRinBarr opened this issue 1 year ago • 1 comments

How are you doing? I am interested in detecting m6A in viral RNA and I am testing with your program. However I am having problems with preprocessm6A. The input I use is 32GB, and when it starts running, the script generates a folder with temporary files. The problem is that these files, including the main one, get to a size of more than 1Tb, and in the end the computer runs out of memory and restarts. I have tried splitting the data, starting with 6GB and then joining the result with combine_binary_file.py, but the result is the same: a huge amount of data over 1TB and a computer crash. I don't know if this could be due to the fact that instead of nanopolish I am using f5c, since I used the new chips, which are not compatible with nanopolish, although the result should be the same, since f5c is designed from it.

Thank you very much

MarioRinBarr avatar May 17 '24 09:05 MarioRinBarr

Hello,

I have been testing preprocessm6A.py more. When I take a subset of my data, with 6GB of pod5 files preprocessm6A is able to process the data and with it use predict_model1 and 2. The problem is that the file it generates is 2.4TB, which makes impossible to analyze the whole information of a single individual., not to mention several of the. Again, I don't know if I'm doing something wrong and it generates such a huge file.

Thank you

MarioRinBarr avatar May 22 '24 13:05 MarioRinBarr

Hi,

We apologise for the issue. CHEUI currently generates a large amount of intermediate files. You can process POD5 files sequentially to predict model 1, e.g.:

POD5 1 -> eventalign -> preprocess_m6a -> CHEUI model 1

You can then delete the eventalign and preprocess files, and keep only the model 1.

Before predicting model 2, you will need to merge and sort all the model 1 files.

Hope this helps.

pre-mRNA avatar Jul 11 '24 02:07 pre-mRNA