NEXUS parser is incomplete
The Nexus parser (beast.util.NexusParser) only handles a minimum subset of the Nexus format. I suggest we switch to the JEBL nexus parser which is much more complete (handles Taxa and Characters blocks etc.). The wrapper class beast.app.beauti2.util.BeautiParser shows how to use this to load multiple partitions from a nexus file.
The Nexus parser breaks on files produced by ape's write.nexus function. As far as I can tell, it's because ape writes "ntax = XX" while the parser expects "ntax=XX" (without spaces). Suggested solution: trim the spaces.
@bjoelle this should fix your problem. Rather than just modifying the regexp, I took this oportunity to transition the taxa block parsing code to parse via the newer NexusCommand reading code (which the tree block parsing code already uses). Thus it should now be robust to all maner of newline/whitespace/semicolon shuffling.
It appears some conversion programs insert spaces at the end of taxon names, then put quotes around them, like so: 'O_subevenosa '. Commit f6ccfce removes these spaces from taxa names, since *BEAST gets confused when there are taxa with and others without such spaces, but are otherwise the same.
Should we actually fix this or just give a good error message? Couldn’t there be a situation when spaces are actually differentiating taxa?
On 21/03/2019, at 12:26 PM, Remco Bouckaert [email protected] wrote:
It appears some conversion programs insert spaces at the end of taxon names, then put quotes around them, like so: 'O_subevenosa '. Commit f6ccfce removes these spaces from taxa names, since *BEAST gets confused when there are taxa with and others without such spaces, but are otherwise the same.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
The fix only removes spaces at the start and end of a taxon name -- I cannot think of a situation where such spaces would be informative.
My principle for reading files is never think you know better than the user, but always strive to give accurate and precise error messages so that the user can easily identify what went wrong.
If you have lazy/relaxed parsing and lots of ad hoc “fixes” then it encourages worse and worse user behaviour over time. If you have a relative strict parsing policy with accurate error messages and documentation then it encourages data hygiene from the user.
On 21/03/2019, at 10:01 PM, Remco Bouckaert [email protected] wrote:
The fix only removes spaces at the start and end of a taxon name -- I cannot think of a situation where such spaces would be informative.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CompEvol/beast2/issues/30#issuecomment-475153852, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3WSdLoYtO9ZMUnmpxkN_-U-tQ72ZOzks5vY0qGgaJpZM4Ba1aI.