hyphy HyPhy MEME Positive Selection at 0 Branches?

I'm hoping to understand how to interpret a result in which HyPhy MEME detects positive selection at a codon, but with 0 branches under selection. An example is present in the following analysis output, codon 44.

Thanks so much! Chase

ABCD_E5.fasta.MEME.json

Aug 26 '24 17:08 singing-scientist

Dear @singing-scientist,

There are a variety of reasons this can happen. For example, see https://github.com/veg/hyphy/issues/1721

It also seems that your json was generated by a rather old version of MEME (current is 4.0, the one you have is 2.1.2), which does not record a lot of the additional information that helps with debugging.

Best, Sergei

Aug 26 '24 18:08 spond

Dear @spond thank you so much! This makes great sense, and I hadn't realized our version was so behind. If I may, a few follows-ups:

(1) As I work with our HPC staff to update HyPhy, I'm looking at the releases here: https://github.com/veg/hyphy/releases

May I confirm that v2.5.62 is the latest release to use, and that this would include MEME 4.0?

(2) Which was the first version of HyPhy & MEME that controlled for synonymous rate variation?

Thanks a ton for your help! Chase

Aug 26 '24 20:08 singing-scientist

Dear @singing-scientist,

1). Yes, v2.5.62 has the latest MEME.

2). All version of MEME going back to 2012 supported SRV.

Key recent new options in MEME

1). Support for multi-nucleotide substitutions (--multiple-hits and --site-multihit) 2). More than 2 rate classes per site (2, 3, or 4, --rates) 3). More detailed output in the JSON (use https://observablehq.com/@spond/meme to visualize) 4). Resampling options (use bootstrap instead of LRT, see https://academic.oup.com/ve/article/9/1/vead019/7078204 for an application). This may be of value for you, because we designed it for small N, low divergence settings. --resample N. Note that the analysis will be much slower with this setting.

Best, Sergei

Aug 26 '24 21:08 spond

Dear @spond thank you so much for these wonderful leads! Zehr et al. was a joy to read, and its methods are indeed possibly very helpful for the work I'm doing with HPV, i.e. to compare controls. vs cancers as you compared FECV vs. FIPV. If I may, two follow-up methodological questions:

--resample is quite slow — under what circumstances might you suggest its usage is necessary? I'm thinking to stick to the default method unless necessary.
On the question of duplicate sequences, I imagine in your FECV vs. FIPV analysis that, when you removed duplicates at the single-ORF level, some identical sequences of that ORF may have been present in both FECV and FIPV samples? May I ask how you dealt with labeling sequences for foreground vs. background this situation? My first inclination is that, if a sequence is shared by both phenotypes, I'd just want to eliminate it from the analysis entirely, but would love feedback on that.

Thanks so much! Chase

Sep 13 '24 07:09 singing-scientist

Stale issue message

Nov 13 '24 00:11 github-actions[bot]