More modules
Hello, we have sucessfully tested DRAM (on 10 MAGs) the output is clear and the HTML is very good to get an overview. I wonder if there aren't more KEGG modules that can be inferred?
Also if you would need to set a threshold for presence-absence would you use a fixed threshold e.g. 0.5 or take the MAG completeness into account?
After reading your publication more carefully I realized that there is much more information in the annotation files and not in the product. I wonder however if it could be possible to estimate the completeness of other Kegg modules. Many KO are part of multiple modules, isn't it? but in the excel file, a KO is only linked to one module. @shafferm
I modified the code from DRAM so it gives me all Kegg modules. It might be worth that to integrate this in DRAM. e.g. the product.tsv has all modules and the product.html only a selection. https://gist.github.com/SilasK/40ca8f1ef719f8176556dfbab6447a84
@SilasK this is really cool! And I think adding this data to product.tsv is a really good idea! I think I will start with adding this as a flag to pump all module completeness information to the product.tsv. Some in our lab have wanted to cluster based on everything that is in the product.html so I don't want to break that workflow right now. We will consider making this a default in the future.
Can I ask how you generate the data/module_step_form.tsv? Is it up to date? and more importantly, don't we infringe some property rights of Kegg?
I generated the module_step_form.tsv from the KEGG modules data as I had pulled down from our KEGG description. I have not updated it since DRAM release. I will add that to my to do list.
Infringing on KEGG is a good question. KEGG isn't really clear about redistribution in any of their copyright notices and don't have a proper license structure as far as I am aware. I think what we are doing is okay for a few reasons. First we are redistributing less than other programs do such as HUMAnN or PICRUSt and as far as I know they have not gotten in trouble. Second everything that we redistribute it available for free via the KEGG API. If we ever get a complaint from the people behind KEGG I will be happy to pull down the module_step_form.tsv and etc_module_database.tsv files and replace them with scripts that pull the same data from the KEGG API. It actually is a good idea to work on these scripts anyway so that we have a more automated way to pull down updates to KEGG modules.
Alternately we have also looked into replacing the KEGG modules with the equivalents from MetaCyc but I have had a hard time finding ways to pull data from MetaCyc directly without using PathwayTools which requires some onerous licensing for anyone else to use it.
Do you have the scripts for creating module_step_form.tsv and etc_module_database.tsv? Either using the API or otherwise?
I'm not ignoring this. I realized I had refactored the code used to generate the module_step_form.tsv so that it worked better for etc_module_database.tsv and now I'm having a hard time making it work again to replicate the module_step_form.tsv file. Hoping to have a solution soon.
This is still under active construction in a way, but it is closer, now. You will be able to add things like this in some future version of dram but it will probably be more intense than an API.
It will in fact be a name space package with instructions on access database will be installed as extensions built from a template. This can be done in conjunction with snake make or not.