Different Aliquot data from xml vs BCR Biotab

Open AimSchina opened this issue 6 years ago • 0 comments

Hi,

I have been having an issue getting biospecimen info via two different ways. Using TCGAbiolinks_2.15.2.

First Method from xml: query <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", file.type = "xml"); GDCdownload(query); biospecimen <- distinct(GDCprepare_clinic(query,clinical.info = c("aliquot")))

gives me 14677 rows and "TCGA-PL-A8LZ-01A-31R-A36F-07" is missing

Second Method from BCR Biotab: query.biospecimen <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", data.type = "Biospecimen Supplement", data.format = "BCR Biotab"); GDCdownload(query.biospecimen); biospecimen.BCRtab.all <- GDCprepare(query.biospecimen); biospecimen2<-biospecimen.BCRtab.all$biospecimen_aliquot_brca

gives me 14539 rows and "TCGA-PL-A8LZ-01A-31R-A36F-07" is included.

I particularly notcided this since I was merging HTSEQ-UQ data with the sample info and this aliquotwas not matched, when I retrieved the data from xml.

Am I doing something wrong in this case?I would appreciate any help to understand this difference, and what is the proper way to get the aliquot information. I am interested in the aliquot barcode, and from there I get patient , sample ,analyte barcode etc..

!!UPDATE!! After using the following code: query <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", file.type = "xml", legacy=TRUE); GDCdownload(query); biospecimen <- distinct(GDCprepare_clinic(query,clinical.info = c("aliquot")))

I see the aliquot above included.

So the difference is when I use legacy True/False. Still a bit confused on the proper way to handle this, when I need a generic TCGA table with all available aliquots. Should I merge legacy=TRUE with legacy=FALSE data?

Thank you!

Dec 10 '19 09:12 AimSchina