seer running for a long time
Hi there, I am following the tutorial using my own batch of genomes (>600 bacterial genome sequences approx 3.5Mb each). For four days now I am stuck at the step: parallel --results answers -j 8 seer -k fsm_out{}.gz -p seer.pheno --struct projection --maf 0.05 ::: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 --threads 8 The stdout files for the first 8 partitions are still being written but it seems to be taking a really long time. Moreover, practically all kmers are flagged with bad-chisq. Any idea what could have gone wrong?
My guess would be that Firth regression is being run on everything (which is very slow), perhaps due to a label mismatch giving erroneous low frequency entries. I am not sure why this would happen with the tutorial data however, which should be quick.
Can I suggest that you try our updated package and tutorial: https://pyseer.readthedocs.io/en/master/tutorial.html