htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

VCF output with invalid QUAL values based on Locale

Open dennishendriksen opened this issue 5 years ago • 1 comments

Consider htsjdk.variant.vcf.VCFEncoder:

private static final String QUAL_FORMAT_STRING = "%.2f";
private static final String QUAL_FORMAT_EXTENSION_TO_TRIM = ".00";

private static String formatQualValue(final double qual) {
        String s = String.format(QUAL_FORMAT_STRING, qual);
        if (s.endsWith(QUAL_FORMAT_EXTENSION_TO_TRIM)) {
            s = s.substring(0, s.length() - QUAL_FORMAT_EXTENSION_TO_TRIM.length());
        }
        return s;
}

This writes invalid QUAL values to VCF for some locales such as 'en_NL':

  public static void main(String[] args) {
    System.out.println(String.format("%.2f", 29.00));
  }
en_NL
29,00
en_US
29.00

An easy fix would be to check for suffix ',00' as well as '.00'. A proper fix should not do any string manipulation.

dennishendriksen avatar Sep 14 '20 12:09 dennishendriksen

Ah, lazy local handling strikes again. I think if we hardcode it to use Locale.US it should be good, but there are probably issues like this all over the place. I might recommend. having your code explicitly set the local at the start of processing in order to work around this. (GATK does Locale.setDefault(Locale.US); at the very start of it's main method to prevent these problems when running). That may obviously be impractical if your other code relies on the local locale though.

I'll patch this instance and try to scan for other places this happens.

lbergelson avatar Sep 24 '20 19:09 lbergelson