rnacentral-webcode icon indicating copy to clipboard operation
rnacentral-webcode copied to clipboard

Import more HGNC xrefs

Open AntonPetrov opened this issue 9 years ago • 1 comments

Need to systematically look through all unmapped HGNC xrefs and try to add as many as possible.

Examples:

  • HGNC:26327 refers to NR_026905 (obsolete, not in RNAcentral) but it has been replaced by NR_138257 (in RNAcentral).

Getting started:

docker exec -it container_id bash
source rnacentral/local/virtualenvs/RNAcentral/bin/activate
cd rnacentral/rnacentral-webcode/rnacentral/
curl -OL ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/json/locus_groups/non-coding_RNA.json
python manage.py map_hgnc -i non-coding_RNA.json -t

AntonPetrov avatar Jan 11 '17 16:01 AntonPetrov

Updates from HGNC

  • CHL1-AS2 - now added NR_144486
  • FAM27E2 - now added NR_103714
  • FAM30C - now added NR_145444
  • LINC01225- now added NR_034112
  • LINC02028 - now added NR_136179
  • SSTR5-AS1 - now added NR_027242
  • TCL6 - now added NR_028288
  • TLX1NB - now added NR_130722

FAM30B - no RefSeq NR but there is XR_001751734

The following all have Ensembl IDs associated with the HGNC entry: ADIRF-AS1 - ENSG00000272734 LINC01902 - ENSG00000283503 LINC01958 - ENSG00000283436 LINC02006 - ENSG00000238755 LINC02009 - ENSG00000283646 PSPC1-AS2 - ENSG00000226352 RN7SL3 - ENSG00000278771 SNHG14 - ENSG00000224078 SRP54-AS1 - ENSG00000258704 ZFHX2-AS1 - ENSG00000157306

The following list are lncRNA genes where there isn't sufficient good quality sequence for these to be added into RNAcentral: CDKN1A-AS1 LRRC3DN MT-LIPCAR PRINS PTCSC1 RNVU1-11 SIRT1-AS SMCR6 TP53COR1 TTTY13B YAM1 DACOR1 DALIR DLG2-AS1 DLX6-AS2 GCASPC LINC00268 LINC00328 LINC00527 LINC00537 LINC00914 LINC01157 LINC01617

The following three no longer exist as approved symbols:

  • FAM74A2 no longer exists as its own record
  • LINC00914 we have withdrawn this record
  • ZNF638-IT1 we have withdrawn this record

AntonPetrov avatar Jan 27 '17 16:01 AntonPetrov