HyPhy MEME Positive Selection at 0 Branches?
I'm hoping to understand how to interpret a result in which HyPhy MEME detects positive selection at a codon, but with 0 branches under selection. An example is present in the following analysis output, codon 44.
Thanks so much! Chase
Dear @singing-scientist,
There are a variety of reasons this can happen. For example, see https://github.com/veg/hyphy/issues/1721
It also seems that your json was generated by a rather old version of MEME (current is 4.0, the one you have is 2.1.2), which does not record a lot of the additional information that helps with debugging.
Best, Sergei
Dear @spond thank you so much! This makes great sense, and I hadn't realized our version was so behind. If I may, a few follows-ups:
(1) As I work with our HPC staff to update HyPhy, I'm looking at the releases here: https://github.com/veg/hyphy/releases
May I confirm that v2.5.62 is the latest release to use, and that this would include MEME 4.0?
(2) Which was the first version of HyPhy & MEME that controlled for synonymous rate variation?
Thanks a ton for your help! Chase
Dear @singing-scientist,
1). Yes, v2.5.62 has the latest MEME.
2). All version of MEME going back to 2012 supported SRV.
Key recent new options in MEME
1). Support for multi-nucleotide substitutions (--multiple-hits and --site-multihit)
2). More than 2 rate classes per site (2, 3, or 4, --rates)
3). More detailed output in the JSON (use https://observablehq.com/@spond/meme to visualize)
4). Resampling options (use bootstrap instead of LRT, see https://academic.oup.com/ve/article/9/1/vead019/7078204 for an application). This may be of value for you, because we designed it for small N, low divergence settings. --resample N. Note that the analysis will be much slower with this setting.
Best, Sergei
Dear @spond thank you so much for these wonderful leads! Zehr et al. was a joy to read, and its methods are indeed possibly very helpful for the work I'm doing with HPV, i.e. to compare controls. vs cancers as you compared FECV vs. FIPV. If I may, two follow-up methodological questions:
- --resample is quite slow — under what circumstances might you suggest its usage is necessary? I'm thinking to stick to the default method unless necessary.
- On the question of duplicate sequences, I imagine in your FECV vs. FIPV analysis that, when you removed duplicates at the single-ORF level, some identical sequences of that ORF may have been present in both FECV and FIPV samples? May I ask how you dealt with labeling sequences for foreground vs. background this situation? My first inclination is that, if a sequence is shared by both phenotypes, I'd just want to eliminate it from the analysis entirely, but would love feedback on that.
Thanks so much! Chase
Stale issue message