Sander Tan comments

Results 13 comments of


                                            Sander Tan

Installation problems

@iandoug I think the pip-version is still working (`pip install wikiextractor`), but if you want to checkout the 29 March 2020 commit @gregburman suggested, you can checkout the commit and...

Installation problems

Glad to hear you made it work. The March version with a fix to the category issue works for me.

Last category in article not checked correctly

I also experienced this issue, thanks for pointing the location of the responsible code. I fixed it by moving by moving the `extract categories`-codeblock up: ```python for line in input:...

Preferred name from incorrect CDB file

Seems to be caused around here: https://github.com/CogStack/MedCATtrainer/blob/a6745a781907f1452be2c9bf7f5176f8da91b2a2/webapp/api/api/utils.py#L60 I'll try to debug it.

Preferred name from incorrect CDB file

I think the application uses a general CUI lookup table that spans across projects, because the GET request for filling in this "Concept Summary" can only pass the CUI, not...

Preferred name from incorrect CDB file

Let me know if you would like me to look into adding this change. I'm not sure though if other parts of the application rely on this "one concept table"...

Preferred name from incorrect CDB file

@tomolopolis We can close this one for now. When using different CDB universes with different pretty names, this issue can be solved by setting up multiple MedCATTrainer instances.

Clinical data in PanCan TCGA

I wrote a parser to extract all rows per study, and drop the columns that only contain empty values. Columns with explicit NA values [Not Available] and [Not Applicable] are...

From https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=demographic a JSON file can be retrieved with the terms, pretty names and descriptions. For example: ``` "her2_erbb2_result_fish": { "description": "the type of outcome for HER2 as determined by...

Clinical data in PanCan TCGA

Status: 1. Adding all clinical data to PanCan studies should be possible with ` clinical_PANCAN_patient_with_followup.tsv` source file. There are 746 columns in total. When a column does not have data...