Expand the information provided in `model.json`
When we run R2DT at RNAcentral, there are several steps to parse the output and import that data into RNAcentral so that we can show diagrams, dot bracket notation, and use hits to determine RNA type. This relies on a table that knows about all the models R2DT uses.
Right now, the file /rna/r2dt/data/models.json contains some of the information needed to update the r2dt_models table at RNAcentral updated, but not everything.
models.json currently provides model_id source, anddescription. To be able to update our table with R2DT's latest set of models we need the following:
| Field | Description |
|---|---|
| model_name | Not always the same as model_id currently. e.g. RFXXXXX for Rfam, most others seem right |
| so_term_id | The SO ID for the corresponding RNA type, e.g. SO:0002344 for mt_SSU_rRNA |
| model_source | exactly as in the current models.json |
| model_length | Currently we extract this from the model cm file using cmstat's clen column |
| model_basepair_count | Also extracted from the model cm file using cmstat's 'bps' column |
If this could be provided by R2DT it would make updating this table a lot more robust, as right now it is quite manual and prone to going wrong.