ImmunoSeq TCRB v4B V2 Export: Input Parsing Error
🐛 Bug
Good Morning,
I recently exported immunoSeq TCRB (v4b) data using the Export V2 option and received the following parsing failures. I understand v4b is fairly new and incorporates new primers sensitive to recently annotated alleles.
I also exported the data using the V1 export options to see if this would help but received a greater abundance of parsing errors.
Please advise on how best to proceed and if these errors will limit repertoire analysis. Thanks!
####-------------------------------------####
> immdata <- repLoad(file_path)
== Step 1/3: loading repertoire files... ==
Processing "c:/Export/immunoseq" ...
-- Parsing "c:/Export/immunoseq/A_TCRB.tsv" -- immunoseq
Warning: 25 parsing failures.
row col expected actual file
2241 jGeneAlleleTies 1/0/T/F/TRUE/FALSE 01,02 'c:/Export/A_TCRB.tsv'
3671 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/A_TCRB.tsv'
3809 jGeneAlleleTies 1/0/T/F/TRUE/FALSE 01,02 'c:/Export/A_TCRB.tsv'
4981 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/A_TCRB.tsv'
7364 jGeneAlleleTies 1/0/T/F/TRUE/FALSE 01,02 'c:/Export/A_TCRB.tsv'
.... ............... .................. ..................... .............................................
See problems(...) for more details.
-- Parsing "c:/Export/immunoseq/B_TCRB.tsv" -- immunoseq
Warning: 29 parsing failures.
row col expected actual file
5117 jGeneAlleleTies 1/0/T/F/TRUE/FALSE 01,02 'c:/Export/B_TCRB.tsv'
5409 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/B_TCRB.tsv'
9027 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/B_TCRB.tsv'
10731 jGeneAlleleTies 1/0/T/F/TRUE/FALSE 01,02 'c:/Export/B_TCRB.tsv'
10958 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/B_TCRB.tsv'
..... ............... .................. ..................... .............................................
See problems(...) for more details.
-- Parsing "c:/Export/C_TCRB.tsv" -- immunoseq
Warning: 14 parsing failures.
row col expected actual file
5660 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/C_TCRB.tsv'
5906 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/C_TCRB.tsv'
6826 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/C_TCRB.tsv'
8067 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/C_TCRB.tsv'
8123 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/C_TCRB.tsv'
.... ............. .................. ..................... .............................................
See problems(...) for more details.
-- Parsing "c:/Export/D_TCRB.tsv" -- immunoseq
Warning: 8 parsing failures.
row col expected actual file
5198 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/D_TCRB.tsv'
12125 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/D_TCRB.tsv'
14180 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/D_TCRB.tsv'
14477 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/D_TCRB.tsv'
14715 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/D_TCRB.tsv'
..... ............. .................. ..................... .............................................
See problems(...) for more details.
-- Parsing "c:/Export/E_TCRB.tsv" -- immunoseq
Warning: 8 parsing failures.
row col expected actual file
1905 vFamilyTies 1/0/T/F/TRUE/FALSE TCRBV11,TCRBV07 'c:/Export/E_TCRB.tsv'
3467 vFamilyTies 1/0/T/F/TRUE/FALSE TCRBV11,TCRBV07 'c:/Export/E_TCRB.tsv'
3468 vFamilyTies 1/0/T/F/TRUE/FALSE TCRBV11,TCRBV07 'c:/Export/E_TCRB.tsv'
3697 vFamilyTies 1/0/T/F/TRUE/FALSE TCRBV10,TCRBV06 'c:/Export/E_TCRB.tsv'
4548 vFamilyTies 1/0/T/F/TRUE/FALSE TCRBV11,TCRBV07 'c:/Export/E_TCRB.tsv'
.... ........... .................. ............... .............................................
See problems(...) for more details.
-- Parsing "c:/Export/F_TCRB.tsv" -- immunoseq
Warning: 9 parsing failures.
row col expected actual file
1470 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
6745 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
7628 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
10868 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
13299 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
..... ............. .................. ..................... .............................................
See problems(...) for more details.
Hello @tdfy
Thank you! It doesn't seem that these errors are limiting the analysis in any way. However, they are definitely cluttering the space and need to be removed. Thank you for letting us know!
Sadly, we don't have an example of the v4b format. And it seems that failures are happening at the 1000-2000-etc rows:
VVVV - row numbers
row col expected actual file
1470 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
6745 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
7628 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
10868 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
13299 jGeneNameTies 1/0/T/F/TRUE/FALSE TCRBJ02-01,TCRBJ02-05 'c:/Export/F_TCRB.tsv'
Is there a way for you to send us a subset of a file that includes some of the failed rows to support at immunomind.io?
Thank you for the prompt response; I will send you a subset.
If I come across any other errors do you want me to post here or a new thread?
Thank you so much! It would greatly help our small team!
We would prefer to have separate issues for separate questions / feature requests and bugs. This was it's much easier to monitor the progress and discuss details. So if you find something non-related to parsing of Adaptive Biotech files, open a new ticket, please. Our team would greatly appreciate this!
I have the exact same problem. I have a large dataset and am getting 3000 to 26000 parsing error for every single file in my dataset. I am experiencing a large number of downstream errors that I feel are stemming from these parsing errors.
Is there any progress on this? May I contact you directly?
Sorry for the pause on this issue, we will look into it on the weekends!
@k-blenman can you send us a sample of your datasets as well, so we can have more files to test?
Email: support at immunomind.io
Thank you.
Closing this issue for now. The next version of Immunarch will mostly support the AIRR Standard data format. More details on the next version of Immunarch are here: https://b-t.cr/t/immunarch-will-significantly-evolve-but-it-will-break-things-and-we-need-your-help/1123