Lighter icon indicating copy to clipboard operation
Lighter copied to clipboard

Getting `died with <Signals.SIGKILL: 9>` when trying to run Lighter with the human genome size (3.2MB)

Open PeterSu92 opened this issue 2 years ago • 10 comments

I'm trying to run this on my computer with a large FASTQ input file, and am running it as a subprocess in Python:

# Set the desired parameters

kmer_size = 31 genome_size = 2000000000 error_rate = 0.1 num_threads = 10

Construct the Lighter command

lighter_command = [ lighter_executable_path, '-r', input_reads_path, '-k', str(kmer_size), str(genome_size),str(error_rate), # Additional arguments '-t', str(num_threads) ]

However, if I set the genome size any larger than the above, it won't work, as I get the following error message:

line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['../../Lighter/lighter', '-r', '100000_NG1D7PJA9F_1.fq', '-k', '31', '3200000000', '0.1', '-t', '10']' died with <Signals.SIGKILL: 9>.

The README says to put in at least the size of the genome of the organism in question, which in this case is the human genome. Am I doing something wrong that's a simple fix? Thank you!

PeterSu92 avatar Aug 15 '23 04:08 PeterSu92

I think signal 9 means the process is killed by the system. Did you run Lighter on a server? How much memory did you specify?

mourisl avatar Aug 15 '23 05:08 mourisl

No, just on my local machine, which is a windows but I'm running it on WSL. Do I need to change the allocation? I remember reading about that somewhere..

PeterSu92 avatar Aug 16 '23 04:08 PeterSu92

I never tested Lighter in that environment. For the human genome, I think you may need to have about 15G memory.

mourisl avatar Aug 16 '23 05:08 mourisl

Hmm what happens if I set the genome size to 2Gb instead of the 3.2Gb, in terms of how the algorithm works?

PeterSu92 avatar Aug 16 '23 05:08 PeterSu92

Worth a try, the genome size is not a very strict parameter. With 2G specification, the memory probably would be around 10G.

mourisl avatar Aug 16 '23 14:08 mourisl

Looks like it was able to run the job, any way to tell if it did anything super awry?

[2023-08-15 22:34:19] =============Start==================== [2023-08-15 22:34:31] Bad quality threshold is "D" [2023-08-15 22:34:53] Finish sampling kmers [2023-08-15 22:34:53] Bloom filter A's false positive rate: 0.000000 [2023-08-15 22:34:59] Finish storing trusted kmers [2023-08-15 22:36:39] Finish error correction Processed 768132 reads: 36267 are error-free Corrected 295242 bases(0.403410 corrections for reads with errors) Trimmed 0 reads with average trimmed bases 0.000000 Discard 0 reads Error correction with Lighter is complete.

PeterSu92 avatar Aug 17 '23 06:08 PeterSu92

The number of rea, 768132, for the human genome seems too low. Is your data Illumina or PacBio/Nanopore?

mourisl avatar Aug 17 '23 13:08 mourisl

Oh, I purposefully used a subset of reads to troubleshoot code so that it wouldn't take forever since I was just trying to play around. In theory though, it should've been 10M reads in that file that Lighter processed (minus however many had an 'N' base call or too low of a Phred quality score), which is why I was wondering if there's still a problem, because this is still an order of magnitude off (hundreds of thousands vs. millions).

PeterSu92 avatar Aug 18 '23 05:08 PeterSu92

Lighter usually fails to correct if the read coverage is too low (< about 8x). So the downsampled reads might be too sparse and the corrected number of basees seems too low. I guess it might be fine on full data set.

mourisl avatar Aug 18 '23 06:08 mourisl

Alright I tried it on the full dataset and it gave the signal 9 error again. So do you think I just need more RAM then?

PeterSu92 avatar Aug 19 '23 18:08 PeterSu92