VCFFileReader can't read BCF files with IDX fields in the header
IDX fields have been allowed since BCF v 2.2 (see end of http://samtools.github.io/hts-specs/VCFv4.3.pdf). The file ex2.uncompressed.bcf in the htsjdk source has IDX fields and cannot be parsed by VCFFileReader, see:
[testng] htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: unexpected tag count 3 in line <ID=PASS,Description="All filters passed",IDX=0>, for input source: /Users/tom/workspace/htsjdk/testdata/htsjdk/variant/ex2.uncompressed.bcf
[testng] at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:226)
[testng] at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:92)
[testng] at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:103)
[testng] at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:89)
[testng] at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:66)
[testng] at htsjdk.variant.vcf.VCFFileReader.<init>(VCFFileReader.java:58)
[testng] at htsjdk.variant.vcf.AbstractVCFCodecTest.testParseUncompressedBcf(AbstractVCFCodecTest.java:42)
[testng] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[testng] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[testng] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[testng] at java.lang.reflect.Method.invoke(Method.java:483)
[testng] at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:86)
[testng] at org.testng.internal.Invoker.invokeMethod(Invoker.java:643)
[testng] at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:820)
[testng] at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1128)
[testng] at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
[testng] at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
[testng] at org.testng.TestRunner.privateRun(TestRunner.java:782)
[testng] at org.testng.TestRunner.run(TestRunner.java:632)
[testng] at org.testng.SuiteRunner.runTest(SuiteRunner.java:366)
[testng] at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:361)
[testng] at org.testng.SuiteRunner.privateRun(SuiteRunner.java:319)
[testng] at org.testng.SuiteRunner.run(SuiteRunner.java:268)
[testng] at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
[testng] at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
[testng] at org.testng.TestNG.runSuitesSequentially(TestNG.java:1244)
[testng] at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
[testng] at org.testng.TestNG.run(TestNG.java:1064)
[testng] at org.testng.TestNG.privateMain(TestNG.java:1385)
[testng] at org.testng.TestNG.main(TestNG.java:1354)
[testng] Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: unexpected tag count 3 in line <ID=PASS,Description="All filters passed",IDX=0>
[testng] at htsjdk.variant.vcf.VCF4Parser.parseLine(VCFHeaderLineTranslator.java:109)
[testng] at htsjdk.variant.vcf.VCFHeaderLineTranslator.parseLine(VCFHeaderLineTranslator.java:51)
[testng] at htsjdk.variant.vcf.VCFSimpleHeaderLine.<init>(VCFSimpleHeaderLine.java:66)
[testng] at htsjdk.variant.vcf.VCFFilterHeaderLine.<init>(VCFFilterHeaderLine.java:62)
[testng] at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:202)
[testng] at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
[testng] at htsjdk.variant.bcf2.BCF2Codec.readHeader(BCF2Codec.java:176)
[testng] at htsjdk.variant.bcf2.BCF2Codec.readHeader(BCF2Codec.java:62)
[testng] at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:224)
I've written a unit test to reproduce: https://github.com/tomwhite/htsjdk/commit/fffc1d5ca02a35c41a89e29e62ea0f5fc5646ced
Any thoughts on how to fix this @lbergelson, @droazen?
@tomwhite htsjdk does not support the latest BCF spec -- we should probably change the parser to throw if it encounters an unsupported BCF.
I suspect that there are ways in which htsjdk is not fully compliant with the latest VCF spec, either -- we may need to allocate an engineer next quarter to go in and make sure we're fully VCF 4.3-compliant, and bring us up to spec if we're not.
OK, thanks for the update. I was surprised because a test file in the htsjdk source tree cannot be parsed by the htsjdk parser...
Your surprise is very justified!
Related to: https://github.com/samtools/htsjdk/issues/628.