NetRAX icon indicating copy to clipboard operation
NetRAX copied to clipboard

MSA input and handling mixed ploidy species in NetRAX

Open sdws1983 opened this issue 8 months ago • 2 comments

Hello, and thank you for developing and maintaining NetRAX.

I'm interested in using NetRAX to infer phylogenetic networks for a group of species. However, I am still quite new to this area and would appreciate some clarification regarding the input requirements.

MSA Input: Many publications mention that NetRAX takes multiple sequence alignments (MSAs) as input, but often do not explain in detail how these MSAs were generated. Could you please share any general recommendations or best practices for preparing MSAs suitable for NetRAX?

Mixed Ploidy: The species I plan to analyze have different ploidy levels, including diploid, tetraploid, and octoploid genomes. I’m unsure how best to represent such a diverse group in the same analysis. Do you have any suggestions on how to handle mixed-ploidy taxa in NetRAX workflows? For example, how do I screen and select which genes' MSA to use as input (because the copy numbers of genes with different ploidy are also different)?

Any guidance, example datasets, or references you could point me to would be very helpful.

Thanks in advance for your help!

sdws1983 avatar May 17 '25 07:05 sdws1983

Hello sdws1983,

NetRAX is currently not under active maintenance, but I'm happy to help, especially with compilation or runtime issues.

MSA Input: To be on the safe side, you could stick with standard MSA formats like FASTA or PHYLIP (NEXUS may also work). These formats are well-supported and have been thoroughly tested with NetRAX. For multi-partitioned MSAs (e.g. in concatenation methods), distinct partition models can be defined using the RAxML-style partition file. Check here:

https://github.com/amkozlov/raxml-ng/wiki/Input-data#evolutionary-model:~:text=for%20details%20%26%20references-,Multiple%20models,-Multiple%20models%20can

Mixed Ploidy: NetRAX assumes one sequence per taxon per locus and doesn’t explicitly handle polyploidy. You may need to use consensus sequences or focus on single-copy orthologs to reduce complexity and maintain compatibility. For more accurate handling of polyploidy, preprocessing with specialized tools or consulting someone with expertise in population genetics may be necessary. Our team is mainly composed of computer scientists, so we’re not best equipped to give a definitive answer here.

Feel free to reach out with further questions!

togkousa avatar May 20 '25 18:05 togkousa

In my set of orthologous groups, no strict single-copy orthologs were identified; rather, each group contains multiple gene copies per species. I am considering randomly selecting one gene from each species within each orthologous group as a representative, thereby generating a sampled set of single-copy orthologs that could be used for downstream analyses?

sdws1983 avatar May 29 '25 09:05 sdws1983