Visualizing the tree with Graphlan
Hi, I generated tree and the alignment for the metagenomic samples. Now I am trying to visualize the tree using graphlan. Based on what can be seen in the tutorial for visualization, one have to download metadata.txt. I am not sure if this is a unique file that should be used for any metagenomic samples or this file is only applied for the example used in the tutorial? Also once I installed graphlan using conda the file "add_metadata_tree.py" can not be found in the bin directory. So is there any way to access the script and download it separately? Thank you Maryam
Hello Maryam,
Thanks for your message, I believe you're following the StrainPhlAn tutorial. The add_metadata_tree.py is not part of GraPhlAn but is provided in MetaPhlAn/StrainPhlAn
In general, the metadata.txt file is not a unique file but is a specific file containing whatever metadata is available for the samples of the cohort considered. In the StrainPhlAn tutorial it is briefly described how this file should be formatted.
I'm adding to this issue @abmiguez as he's developing and maintaining StrainPhlAn which I'm sure can help you with this.
Many thanks, Francesco
Hi @Maryamtarazkar,
Yes, the add_metadata_tree.py script is included in the MetaPhlAn conda package and is callable as is from command line. Otherwise, you can use it as standalone by retrieving it from here. In order to be executed, it depends on pandas, numpy, and dendropy, all dependencies included in the MetaPhlAn conda package
Thank you for your reply. Is there any tuturial/reference how I can get the metadata.txt for my samples ?
Hi Maryam, In the StrainPhlAn wiki page you have an example of how to create your metadata.txt file :(https://github.com/biobakery/MetaPhlAn/wiki/StrainPhlAn-3.0) Basically, it should be a tabular file containing two colums:
- sampleID: the ID of your metagenomic samples
-
Metadata_field_name: this field can be named as you prefer, but the name of the field should be specify with the -m parameter of both
add_metadata_tree.pyandplot_tree_graphlan.pyscripts
E.g:
sampleID subjectID
SRS055982 638754422
SRS022137 638754422
SRS019161 763496533
SRS013951 763496533
SRS014613 763840445
SRS064276 763840445
G000273725 ReferenceGenomes
In this example the -m parameter should be subjectID