shapeit5 icon indicating copy to clipboard operation
shapeit5 copied to clipboard

ERROR: Haploid underflow impossible to recover

Open sadafrf opened this issue 1 year ago • 5 comments

Hi, I am running the shapeit5 phase_common_static command on dsub and this is what I run:

aou_dsub \
  --min-ram 100 \
  --disk-size 300 \
  --min-cores 8 \
  --timeout "14d" \
  --name "${JOB_NAME}" \
  --input-recursive MAP="${WORKSPACE_BUCKET}/genetic_map" \
  --input IN="${WORKSPACE_BUCKET}/data/hail_filtered.vcf.bgz" \
  --input INDEX="${WORKSPACE_BUCKET}/data/hail_filtered.vcf.bgz.tbi" \
  --output-recursive OUT="${WORKSPACE_BUCKET}/data" \
  --command 'set -o errexit && \
             set -o xtrace && \
             phase_common_static --input "${IN}" --region chr6 --map "${MAP}/plink.chr6.GRCh38.map" --output "${OUT}/filtered_phased.bcf" --thread 8'

after running about 10 hours, the dstat status shows:

stopped running "user-command"

and gives me the the following error in the log file:

ERROR: Haploid underflow impossible to recover for [sample id]

I suspect it is because I have some missing value (marked as ./. ) in my vcf file for the GT calls, but not sure about it because right now all of my sample ids have the missing values less than 10% across all the SNPs. Can you please let me know what might be the cause of this haploid underflow issue?

sadafrf avatar Feb 03 '25 20:02 sadafrf

Adding a bit more information to this Issue: Typically, we would provide a link to the data files that lead to this error. Unfortunately, these are personal data stored in the All of Us Research Program ecosystem, and can not be shared publicly. If you have access to that system (https://www.researchallofus.org/), we can share the workspace with you, so you can reproduce.

traviswheeler avatar Feb 09 '25 19:02 traviswheeler

I am also getting this error and can't figure out the problem. I've tried filtering out samples with high levels of missing genotypes, looked for any formatting issues, and ensured that all AC/AN values are present, and nothing seems to resolve the issue. I'm testing on an autosome, so it shouldn't be an issue with sex chromosome ploidy either.

Any guidance on what can cause this error would be super helpful!!

nsauerwald avatar Feb 11 '25 20:02 nsauerwald

For what it's worth: it seems like this repository is no longer being monitored or supported. I don't see activity on Issues since mid 2023, which is about the time that @odelaneau moved to a new position.

I'll be pleasantly surprised to learn that I'm wrong, and grateful to get some kind of response to this Issue.

traviswheeler avatar Feb 11 '25 20:02 traviswheeler

I received an e-mail from @RJHFMSTR, asking if we encounter the same issue when using the provided static binaries. I'll reply here so others can benefit from the conversation. My reply:

We are working inside the All of Us Research Platform, making use of the docker container provided here in GitHub (see below). Because of AoU's infrastructure, it's a challenge to directly run the binary on input data, so all we have is results from the docker version. Perhaps others experiencing the same error (comment above, and maybe #96 ?) have used the binary directly?

To use the docker container, we downloaded shapeit5_v5.1.1.docker.tar.gz (from https://github.com/odelaneau/shapeit5/releases/tag/v5.1.1) then ran: docker load -i shapeit5_v5.1.1.docker.tar.gz

traviswheeler avatar Feb 12 '25 16:02 traviswheeler

I was using the static binary ("phase_common_static") on some data I have saved locally when I encountered this error, so it is definitely not specific to the docker container.

In any case, I have moved to a different phasing tool due to the lack of active support for SHAPEIT5.

nsauerwald avatar Feb 12 '25 17:02 nsauerwald