Using an other reference genome
Dear,
I'm trying to use your tool to visualize Structural variants but I'm working with an other organism than human so I would like to change the reference genome to use. I tried to do the following:
vcf_file <- './comparison_1kbp_typesafe_min3_30bp_strand-type1.vcf'
chrom=c("NC_058021.1","NC_058022.1","NC_058023.1","NC_058024.1","NC_058025.1",
"NC_058026.1","NC_058027.1","NC_058028.1","NC_058029.1","NC_058030.1",
"NC_058031.1","NC_058032.1","NC_058033.1","NC_058034.1","NC_058035.1",
"NC_058036.1","NC_058037.1","NC_058041.1","NC_058038.1","NC_058039.1",
"NC_058040.1")
createVCFplot(vcf_file, FASTA_FILE="./Solea_senegalensis/ncbi_dataset/data/GCF_019176455.1/GCF_019176455.1_IFAPA_SoseM_1_genomic.fna", ASSEMBLY="SoSeM", CHR_NAMES=chrom)
But I end up with the following error message:
Erreur dans .Call2("new_input_filexp", filepath, PACKAGE = "XVector") :
cannot open file './Solea_senegalensis/ncbi_dataset/data/GCF_019176455.1/GCF_019176455.1_IFAPA_SoseM_1_genomic.fna'
I'm sure the path to the file is correct and the file is for sure in fasta format because it was downloaded directly from NCBI.
Can you help me with this issue?
Thank you in advance, Enora
This package seems to have been forgotten. It took me a long time to discover that there were some issues with the load_chr section, so I made some modifications so that it could be used for mm39 analysis. However, my coding skills are poor, so I hope this can serve as a reference for you. acomiy_load_chr <- function(FASTA_FILE, CHR_NAMES=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY")){
load FASTA
genomestrings <- Biostrings::readDNAStringSet(FASTA_FILE)
fasta_names <- sapply(strsplit(names(genomestrings), " "), [, 1)
names(genomestrings) <- fasta_names
CHRs <- genomestrings[CHR_NAMES]
chr_lengths <- BiocGenerics::width(CHRs)
seqinfo <- GenomeInfoDb::Seqinfo(seqnames = CHR_NAMES, seqlengths = chr_lengths)
seqinfo
}
FASTA_FILE <- '/home/acomiy/breastcancer/anno/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa'
library(Biostrings)
genomestrings <- Biostrings::readDNAStringSet(FASTA_FILE)
fasta_names <- sapply(strsplit(names(genomestrings), " "), [, 1)
names(genomestrings) <- fasta_names
CHRs <- genomestrings[fasta_names]
chr_lengths <- BiocGenerics::width(CHRs)
seqinfo <- GenomeInfoDb::Seqinfo(seqnames = fasta_names, seqlengths = chr_lengths, genome = 'GRCm39')
GRCm39_SEQINFO <- seqinfo
VCF_FILE <- "/home/acomiy/breastcancer/WGS/circos/tumor_high_SV_impact.vcf" VCF_GZ <- bgzip(VCF_FILE, dest=tempfile(fileext = ".vcf.gz")) VCF_INDEX <- indexTabix(VCF_GZ, format="vcf") VCF_TAB <- TabixFile(VCF_GZ, index=VCF_INDEX) vcf_data <- readVcf(VCF_TAB, genome="GRCm39")
You need to modify part of the content in createVCFplot. I hope this helps.