Different Aliquot data from xml vs BCR Biotab
Hi,
I have been having an issue getting biospecimen info via two different ways. Using TCGAbiolinks_2.15.2.
First Method from xml:
query <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", file.type = "xml"); GDCdownload(query); biospecimen <- distinct(GDCprepare_clinic(query,clinical.info = c("aliquot")))
gives me 14677 rows and "TCGA-PL-A8LZ-01A-31R-A36F-07" is missing
Second Method from BCR Biotab:
query.biospecimen <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", data.type = "Biospecimen Supplement", data.format = "BCR Biotab"); GDCdownload(query.biospecimen); biospecimen.BCRtab.all <- GDCprepare(query.biospecimen); biospecimen2<-biospecimen.BCRtab.all$biospecimen_aliquot_brca
gives me 14539 rows and "TCGA-PL-A8LZ-01A-31R-A36F-07" is included.
I particularly notcided this since I was merging HTSEQ-UQ data with the sample info and this aliquotwas not matched, when I retrieved the data from xml.
Am I doing something wrong in this case?I would appreciate any help to understand this difference, and what is the proper way to get the aliquot information. I am interested in the aliquot barcode, and from there I get patient , sample ,analyte barcode etc..
!!UPDATE!!
After using the following code:
query <- GDCquery(project = "TCGA-BRCA", data.category = "Biospecimen", file.type = "xml", legacy=TRUE); GDCdownload(query); biospecimen <- distinct(GDCprepare_clinic(query,clinical.info = c("aliquot")))
I see the aliquot above included.
So the difference is when I use legacy True/False. Still a bit confused on the proper way to handle this, when I need a generic TCGA table with all available aliquots. Should I merge legacy=TRUE with legacy=FALSE data?
Thank you!