Using an other reference genome

Open Enorya opened this issue 1 year ago • 1 comments

Dear,

I'm trying to use your tool to visualize Structural variants but I'm working with an other organism than human so I would like to change the reference genome to use. I tried to do the following:

vcf_file <- './comparison_1kbp_typesafe_min3_30bp_strand-type1.vcf'
chrom=c("NC_058021.1","NC_058022.1","NC_058023.1","NC_058024.1","NC_058025.1",
        "NC_058026.1","NC_058027.1","NC_058028.1","NC_058029.1","NC_058030.1",
        "NC_058031.1","NC_058032.1","NC_058033.1","NC_058034.1","NC_058035.1",
        "NC_058036.1","NC_058037.1","NC_058041.1","NC_058038.1","NC_058039.1",
        "NC_058040.1")
createVCFplot(vcf_file, FASTA_FILE="./Solea_senegalensis/ncbi_dataset/data/GCF_019176455.1/GCF_019176455.1_IFAPA_SoseM_1_genomic.fna", ASSEMBLY="SoSeM",  CHR_NAMES=chrom)

But I end up with the following error message:

Erreur dans .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : 
  cannot open file './Solea_senegalensis/ncbi_dataset/data/GCF_019176455.1/GCF_019176455.1_IFAPA_SoseM_1_genomic.fna'

I'm sure the path to the file is correct and the file is for sure in fasta format because it was downloaded directly from NCBI.

Can you help me with this issue?

Thank you in advance, Enora

Aug 19 '24 12:08 Enorya

This package seems to have been forgotten. It took me a long time to discover that there were some issues with the load_chr section, so I made some modifications so that it could be used for mm39 analysis. However, my coding skills are poor, so I hope this can serve as a reference for you. acomiy_load_chr <- function(FASTA_FILE, CHR_NAMES=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22","chrX","chrY")){

load FASTA

genomestrings <- Biostrings::readDNAStringSet(FASTA_FILE) fasta_names <- sapply(strsplit(names(genomestrings), " "), [, 1) names(genomestrings) <- fasta_names CHRs <- genomestrings[CHR_NAMES] chr_lengths <- BiocGenerics::width(CHRs) seqinfo <- GenomeInfoDb::Seqinfo(seqnames = CHR_NAMES, seqlengths = chr_lengths) seqinfo }

FASTA_FILE <- '/home/acomiy/breastcancer/anno/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa' library(Biostrings) genomestrings <- Biostrings::readDNAStringSet(FASTA_FILE) fasta_names <- sapply(strsplit(names(genomestrings), " "), [, 1) names(genomestrings) <- fasta_names CHRs <- genomestrings[fasta_names] chr_lengths <- BiocGenerics::width(CHRs) seqinfo <- GenomeInfoDb::Seqinfo(seqnames = fasta_names, seqlengths = chr_lengths, genome = 'GRCm39') GRCm39_SEQINFO <- seqinfo

VCF_FILE <- "/home/acomiy/breastcancer/WGS/circos/tumor_high_SV_impact.vcf" VCF_GZ <- bgzip(VCF_FILE, dest=tempfile(fileext = ".vcf.gz")) VCF_INDEX <- indexTabix(VCF_GZ, format="vcf") VCF_TAB <- TabixFile(VCF_GZ, index=VCF_INDEX) vcf_data <- readVcf(VCF_TAB, genome="GRCm39")

You need to modify part of the content in createVCFplot. I hope this helps.

May 30 '25 07:05 acomiy