BioRED icon indicating copy to clipboard operation
BioRED copied to clipboard

how to convert text to Input.pubtator (NER) required by BIORED

Open Khyati-Microcrispr opened this issue 1 year ago • 4 comments

Hi,

Biored ran efficiently, thank you for your help. I have one more favor to ask. How can I perform Named Entity Recognition (NER) and linking in the format required by BioRED for relation prediction? I have input data containing text, titles, and PubMed IDs. I tried using AIONER, but it's not working. I also tried raising an issue on AIONER's GitHub, but no one is replying. Could you please provide me with the correct AIONER code and environment setup, along with the CUDA and cuDNN versions? I am using Ubuntu 22.04, GPU: RTX 4090. Alternatively, if there is any other way to accomplish this task, please let me know.

Khyati-Microcrispr avatar Jun 04 '24 09:06 Khyati-Microcrispr

Hi @Khyati-Microcrispr,

AIONER does not link entities to their corresponding concept identifiers (e.g., NCBI gene IDs). However, BioREx relies on these concept identifiers. Within PubTator3, we have integrated several normalization tools, including GNorm2, TaggerOne, the NLM-Chem model, and tmVar3, to support the normalization process (https://www.ncbi.nlm.nih.gov/research/pubtator3/api). If you just want to process PubMed abstracts, we have processed them, and the results can be accessed at https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3. For questions regarding the AIONER tool, you may contact Dr. Luo ([email protected]).

ptlai avatar Jun 05 '24 17:06 ptlai

Hi, can I know how many papers have you processed? Using FTP I was only able to get relations for 9 million papers.

On Wed, 5 Jun 2024 at 22:50, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr,

AIONER does not link entities to their corresponding concept identifiers (e.g., NCBI gene IDs). However, BioREx relies on these concept identifiers. Within PubTator3, we have integrated several normalization tools, including GNorm2, TaggerOne, the NLM-Chem model, and tmVar3, to support the normalization process (https://www.ncbi.nlm.nih.gov/research/pubtator3/api). If you just want to process PubMed abstracts, we have processed them, and the results can be accessed at https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3. For questions regarding the AIONER tool, you may contact Dr. Luo @.***).

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/8#issuecomment-2150576360, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYQVA6I2VIHUUL3B373ZF5CHXAVCNFSM6AAAAABIYCGZM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGU3TMMZWGA . You are receiving this because you were mentioned.Message ID: @.***>

Khyati-Microcrispr avatar Jul 05 '24 07:07 Khyati-Microcrispr

Hi @Khyati-Microcrispr ,

We processed all PubMed abstracts, totaling around 37 million, but only a quarter of the abstracts contained relations.

ptlai avatar Jul 09 '24 19:07 ptlai

Hi, I hope this message finds you well.

I am writing to address a concern regarding the identification of chemicals, antibodies, and peptides in PubTator. Specifically, I have encountered issues where certain entities, such as "Tirzepatide" and "Pascolizumab," do not have unique IDs or are clustered with numerous other entities.

For example:

  • Searching for "Tirzepatide" returns multiple entries without unique IDs.
  • Similarly, "Pascolizumab" results in clusters of entities without clear unique identification.

Here are some search results from the dataset:

  • Tirzepatide:

33325008 Chemical - cTirzepatide|twincretin|DPP4i|aTirzepatide|TZP|anti-hyperglycaemic agents|bTirzepatide|SGLT2i|oral anti-hyperglycaemic medication|OAM|SU|diacid PubTator3

  • Pascolizumab:

27637004 Chemical - pitakinra|lebrikizunab|pascolizumab PubTator324032029 Chemical - PIP|steroidsensitive|molecular|inositol triphosphate|SCH 900117|agents|CNTO|RG4934|pascolizumab|pathogen|molecular pattern molecules PubTator332380052 Chemical - SPMs|resolvins|ICS|SB010|beta2-agonists|SABA|inhaled corticosteroids|microbial associated molecular patterns|anti|LABA|muscarinic acetylcholine receptor antagonist|Aerovant|leukotriene (LT)B4|methylene and toluidine blue|pascolizumab|MAMPs PubTator3

Could you please provide insights on whether there are plans to resolve these issues, particularly regarding the assignment of unique IDs to these entities and their relationships? Any improvements or updates in this regard would be greatly appreciated.

Thank you for your attention to this matter.

Best regards, Khyati

On Wed, 10 Jul 2024 at 00:52, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr ,

We processed all PubMed abstracts, totaling around 37 million, but only a quarter of the abstracts contained relations.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/8#issuecomment-2218469116, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYSVLTTZEC253ED2TW3ZLQZ5XAVCNFSM6AAAAABIYCGZM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJYGQ3DSMJRGY . You are receiving this because you were mentioned.Message ID: @.***>

Khyati-Microcrispr avatar Aug 05 '24 09:08 Khyati-Microcrispr

This should have been answered in your email to our team, so I closed it.

ptlai avatar Jan 29 '25 15:01 ptlai

Ok, thanks.

On Wed, 29 Jan 2025 at 20:33, Po-Ting Lai @.***> wrote:

This should have been answered in your email to our team, so I closed it.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/8#issuecomment-2621903920, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYTBLKVQYFVU6YOFLDD2NDUSNAVCNFSM6AAAAABWDCFVUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRRHEYDGOJSGA . You are receiving this because you were mentioned.Message ID: @.***>

Khyati-Microcrispr avatar Jan 31 '25 06:01 Khyati-Microcrispr