Problem with preprocessm6A
How are you doing? I am interested in detecting m6A in viral RNA and I am testing with your program. However I am having problems with preprocessm6A. The input I use is 32GB, and when it starts running, the script generates a folder with temporary files. The problem is that these files, including the main one, get to a size of more than 1Tb, and in the end the computer runs out of memory and restarts. I have tried splitting the data, starting with 6GB and then joining the result with combine_binary_file.py, but the result is the same: a huge amount of data over 1TB and a computer crash. I don't know if this could be due to the fact that instead of nanopolish I am using f5c, since I used the new chips, which are not compatible with nanopolish, although the result should be the same, since f5c is designed from it.
Thank you very much
Hello,
I have been testing preprocessm6A.py more. When I take a subset of my data, with 6GB of pod5 files preprocessm6A is able to process the data and with it use predict_model1 and 2. The problem is that the file it generates is 2.4TB, which makes impossible to analyze the whole information of a single individual., not to mention several of the. Again, I don't know if I'm doing something wrong and it generates such a huge file.
Thank you
Hi,
We apologise for the issue. CHEUI currently generates a large amount of intermediate files. You can process POD5 files sequentially to predict model 1, e.g.:
POD5 1 -> eventalign -> preprocess_m6a -> CHEUI model 1
You can then delete the eventalign and preprocess files, and keep only the model 1.
Before predicting model 2, you will need to merge and sort all the model 1 files.
Hope this helps.