beast2 icon indicating copy to clipboard operation
beast2 copied to clipboard

NEXUS parser is incomplete

Open tgvaughan opened this issue 12 years ago • 6 comments

The Nexus parser (beast.util.NexusParser) only handles a minimum subset of the Nexus format. I suggest we switch to the JEBL nexus parser which is much more complete (handles Taxa and Characters blocks etc.). The wrapper class beast.app.beauti2.util.BeautiParser shows how to use this to load multiple partitions from a nexus file.

tgvaughan avatar Jan 19 '14 03:01 tgvaughan

The Nexus parser breaks on files produced by ape's write.nexus function. As far as I can tell, it's because ape writes "ntax = XX" while the parser expects "ntax=XX" (without spaces). Suggested solution: trim the spaces.

bjoelle avatar Mar 13 '19 20:03 bjoelle

@bjoelle this should fix your problem. Rather than just modifying the regexp, I took this oportunity to transition the taxa block parsing code to parse via the newer NexusCommand reading code (which the tree block parsing code already uses). Thus it should now be robust to all maner of newline/whitespace/semicolon shuffling.

tgvaughan avatar Mar 14 '19 14:03 tgvaughan

It appears some conversion programs insert spaces at the end of taxon names, then put quotes around them, like so: 'O_subevenosa '. Commit f6ccfce removes these spaces from taxa names, since *BEAST gets confused when there are taxa with and others without such spaces, but are otherwise the same.

rbouckaert avatar Mar 20 '19 23:03 rbouckaert

Should we actually fix this or just give a good error message? Couldn’t there be a situation when spaces are actually differentiating taxa?

On 21/03/2019, at 12:26 PM, Remco Bouckaert [email protected] wrote:

It appears some conversion programs insert spaces at the end of taxon names, then put quotes around them, like so: 'O_subevenosa '. Commit f6ccfce removes these spaces from taxa names, since *BEAST gets confused when there are taxa with and others without such spaces, but are otherwise the same.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

alexeid avatar Mar 20 '19 23:03 alexeid

The fix only removes spaces at the start and end of a taxon name -- I cannot think of a situation where such spaces would be informative.

rbouckaert avatar Mar 21 '19 09:03 rbouckaert

My principle for reading files is never think you know better than the user, but always strive to give accurate and precise error messages so that the user can easily identify what went wrong.

If you have lazy/relaxed parsing and lots of ad hoc “fixes” then it encourages worse and worse user behaviour over time. If you have a relative strict parsing policy with accurate error messages and documentation then it encourages data hygiene from the user.

On 21/03/2019, at 10:01 PM, Remco Bouckaert [email protected] wrote:

The fix only removes spaces at the start and end of a taxon name -- I cannot think of a situation where such spaces would be informative.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CompEvol/beast2/issues/30#issuecomment-475153852, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3WSdLoYtO9ZMUnmpxkN_-U-tQ72ZOzks5vY0qGgaJpZM4Ba1aI.

alexeid avatar Mar 21 '19 20:03 alexeid