AssertionError: The dataframe does not have all the columns Chromosome, Start and End.
I am currently trying to replicate the steps presented in the Tutorial: 10x multiome pbmc
However, I ran into an AssertionError: The dataframe does not have all the columns Chromosome, Start and End
Here is the output shown in the terminal:
(scenicplus) userLab@instance-4:~$ python example.py
... storing 'feature_types' as categorical
... storing 'genome' as categorical
... storing 'celltype' as categorical
... storing 'feature_types' as categorical
... storing 'genome' as categorical
2023-08-21 07:37:53,258 cisTopic INFO Reading fragments from pbmc_tutorial/data/pbmc_granulocyte_sorted_3k_atac_fragments.tsv.gz
2023-08-21 07:38:19,934 cisTopic INFO Creating pseudobulk for B_cells
2023-08-21 07:38:25,408 cisTopic INFO B_cells done!
2023-08-21 07:38:25,408 cisTopic INFO Creating pseudobulk for B_cells_1
2023-08-21 07:38:56,783 cisTopic INFO B_cells_1 done!
2023-08-21 07:38:56,784 cisTopic INFO Creating pseudobulk for B_cells_2
2023-08-21 07:39:07,328 cisTopic INFO B_cells_2 done!
2023-08-21 07:39:07,328 cisTopic INFO Creating pseudobulk for CD14_Monocytes
2023-08-21 07:41:20,535 cisTopic INFO CD14_Monocytes done!
2023-08-21 07:41:20,536 cisTopic INFO Creating pseudobulk for CD4_T_cells
2023-08-21 07:44:00,022 cisTopic INFO CD4_T_cells done!
2023-08-21 07:44:00,023 cisTopic INFO Creating pseudobulk for CD8_T_cells
2023-08-21 07:44:28,040 cisTopic INFO CD8_T_cells done!
2023-08-21 07:44:28,042 cisTopic INFO Creating pseudobulk for FCGR3A_Monocytes
2023-08-21 07:44:42,860 cisTopic INFO FCGR3A_Monocytes done!
2023-08-21 07:44:42,860 cisTopic INFO Creating pseudobulk for NK_cells
2023-08-21 07:44:52,445 cisTopic INFO NK_cells done!
2023-08-21 07:44:52,448 cisTopic INFO Calling peaks for B_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells.bed.gz --name B_cells --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:00,660 cisTopic INFO B_cells done!
2023-08-21 07:45:00,660 cisTopic INFO Calling peaks for B_cells_1 with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells_1.bed.gz --name B_cells_1 --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:22,829 cisTopic INFO B_cells_1 done!
2023-08-21 07:45:22,829 cisTopic INFO Calling peaks for B_cells_2 with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/B_cells_2.bed.gz --name B_cells_2 --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:45:38,648 cisTopic INFO B_cells_2 done!
2023-08-21 07:45:38,648 cisTopic INFO Calling peaks for CD14_Monocytes with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD14_Monocytes.bed.gz --name CD14_Monocytes --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:46:34,801 cisTopic INFO CD14_Monocytes done!
2023-08-21 07:46:34,801 cisTopic INFO Calling peaks for CD4_T_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD4_T_cells.bed.gz --name CD4_T_cells --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:47:32,765 cisTopic INFO CD4_T_cells done!
2023-08-21 07:47:32,765 cisTopic INFO Calling peaks for CD8_T_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/CD8_T_cells.bed.gz --name CD8_T_cells --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:47:51,510 cisTopic INFO CD8_T_cells done!
2023-08-21 07:47:51,510 cisTopic INFO Calling peaks for FCGR3A_Monocytes with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/FCGR3A_Monocytes.bed.gz --name FCGR3A_Monocytes --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:48:12,298 cisTopic INFO FCGR3A_Monocytes done!
2023-08-21 07:48:12,298 cisTopic INFO Calling peaks for NK_cells with macs2 callpeak --treatment pbmc_tutorial/scATAC/consensus_peak_calling/pseudobulk_bed_files/NK_cells.bed.gz --name NK_cells --outdir pbmc_tutorial/scATAC/consensus_peak_calling/MACS/ --format BEDPE --gsize hs --qvalue 0.05 --nomodel --shift 73 --extsize 146 --keep-dup all --call-summits --nolambda
2023-08-21 07:48:24,263 cisTopic INFO NK_cells done!
2023-08-21 07:48:24,475 cisTopic INFO Extending and merging peaks per class
Traceback (most recent call last):
File "example.py", line 310, in <module>
consensus_peaks=get_consensus_peaks(narrow_peaks_dict, peak_half_width, chromsizes=chromsizes, path_to_blacklist=path_to_blacklist)
File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 65, in get_consensus_peaks
center_extended_peaks = [
File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 67, in <listcomp>
calculate_peaks_and_extend(
File "/home/userLab/.local/lib/python3.8/site-packages/pycisTopic/iterative_peak_calling.py", line 142, in calculate_peaks_and_extend
blacklist = pr.read_bed(path_to_blacklist)
File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/readers.py", line 98, in read_bed
return PyRanges(df)
File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/pyranges_main.py", line 178, in __init__
_init(self, df, chromosomes, starts, ends, strands, copy_df)
File "/home/userLab/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pyranges/methods/init.py", line 120, in _init
assert all(
AssertionError: The dataframe does not have all the columns Chromosome, Start and End.
I believe the error is associated with the type of ### hg38-blacklist.v2.bed i downloaded. I downloaded the file from here: https://github.com/igordot/reference-genomes/blob/master/hg38/blacklist.v2.bed
I don't know if downloaded the right version of the hg38-blacklist.v2.bed file because I couldn't find it on the scenic website. All the other files referenced in the tutorial were downloaded. I am using the data files as described in the tutorial, not my own data.
Here is the line of code for line 310 the error is mentioning:
# wget -O hg38-blacklist.v2.bed https://github.com/igordot/reference-genomes/blob/master/hg38/blacklist.v2.bed
path_to_blacklist= '/home/userLab/pycisTopic/blacklist/hg38-blacklist.v2.bed'
# Get consensus peaks
consensus_peaks=get_consensus_peaks(narrow_peaks_dict, peak_half_width, chromsizes=chromsizes, path_to_blacklist=path_to_blacklist)
consensus_peaks.to_bed(
path = os.path.join(work_dir, 'scATAC/consensus_peak_calling/consensus_regions.bed'),
keep=True,
compression='infer',
chain=False)
Does anyone know how to address this error? Where did people get the blacklist.v2.bed file?
Hi @crossworlds423
We host the blacklist regions that were used for the tutorial on the pycisTopic repo: https://github.com/aertslab/pycisTopic/blob/master/blacklist/hg38-blacklist.v2.bed
Let me know if this fixes your issue.
Best,
Seppe