Missing GFF file for SPECTRA_MSD/TB_COVID_GFP
Hi,
I am currently working on phenotype prediction from unique strains of antitubercular agents. Thanks for this great work, SPECTRA would be helpful to test the model that I am working on. I was trying to reproduce the analysis for TB drug splits. I have downloaded the required data files from the dataverse page. But in the "run_baseline.py" file, I find that there are some missing files like input_gff_file, reference_nucleotide, and full_reference_sequence. Could you let me know how can I get these files and run the "run_baseline.py" file correctly?
Apologies for the delayed response! You actually do not need those files, these were needed when once upon a time I did data processing per step but I did that once and provide the processed data files. So if you provide None for those entries the script should run if there are more errors though let me know!
Hi, I am trying to run with the following command python run_baseline.py 0 INH logistic_regression 0.1 binary --trial_run True. There are some files missing callbacks.py constants.py, generate_barcode.py, mod_alignment_utils.py, check_mutational_splits.py missing from the SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/utils folder. I am guessing most of these methods have been moved to #https://github.com/mims-harvard/SPECTRA/blob/75b59639dcae6adad92af4a34313a75196a2659c/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/utils/general_utility_functions.py. Right now I am commenting out those import statements and importing this file.
However, I could not fine "GenerateBarcode" as mentioned in this line #https://github.com/mims-harvard/SPECTRA/blob/75b59639dcae6adad92af4a34313a75196a2659c/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/dataset/Sequence_Dataset.py#L54 in any of the utils files or the Sequence_dataset.py file. I have tried commenting it out but it does not work. There are some methods related to barcode in the Sequence_dataset file but I could not find the one matching with this params GenerateBarcode(drug, use_pregenerated)
Traceback (most recent call last):
File "run_baseline.py", line 395, in <module>
run_baseline(**params_to_use)
File "run_baseline.py", line 140, in run_baseline
sequence_dataset.initialize_encoder()
File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 109, in initialize_encoder
all_train_outputs = [self.__getitem__(i) for i in tqdm(self.return_train_strains(), total=len(self.train_strains))]
File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 109, in <listcomp>
all_train_outputs = [self.__getitem__(i) for i in tqdm(self.return_train_strains(), total=len(self.train_strains))]
File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 376, in __getitem__
sequences = self.get_sequences_barcode(i)
File "./SPECTRA/SPECTRA_paper/SPECTRA_MSD/TB_Covid_GFP/run/Sequence_Dataset.py", line 370, in get_sequences_barcode
return self.fetcher.barcode(strain)
AttributeError: 'Sequence_Dataset' object has no attribute 'fetcher'