ChIPseeker icon indicating copy to clipboard operation
ChIPseeker copied to clipboard

genomicAnnotationPriority ChIPseeker v1.36.0

Open HAOXUANmogu opened this issue 2 years ago • 7 comments

Hi,

met a problem with ChIPseeker recently.

The first one is the region priority problem with "genomicAnnotationPriority"

My question is: when I use genomicAnnotationPriority = c("3UTR", "5UTR", "Promoter", "Exon", "Intron", "Downstream", "Intergenic"), the annotation file shows both 3'UTR and 5UTR region;

when I use genomicAnnotationPriority = c("Exon", "Intron", "3UTR", "5UTR", "Promoter", "Downstream", "Intergenic"), the annotation file shows neither 3'UTR nor 5UTR region;

The second one is the strand problem with "sameStrand = TRUE", it seems not working.

Here is my code list below:

library(ChIPseeker)

library(GenomicFeatures)

tair_10 <- makeTxDbFromGFF("TAIR10.release55.gtf")

peak <-readPeakFile("test.tsv")

peakAnno <-annotatePeak(peak, tssRegion=c(-3000,3000),TxDb = tair_10,
                        assignGenomicAnnotation = TRUE,
                        genomicAnnotationPriority = c("3UTR","5UTR","Promoter","Exon", "Intron","Downstream", "Intergenic"),
                        annoDb = NULL,
                        addFlankGeneInfo = FALSE,
                        flankDistance = 5000,
                        sameStrand = TRUE,
                        #ignoreOverlap = FALSE,
                        #ignoreUpstream = FALSE,
                        #ignoreDownstream = FALSE,
                        overlap = "all",
                        verbose = TRUE)

peakAnno_cluster <-as.data.frame(peakAnno)

#查看summary信息,peaks在基因组上的位置
peakAnno
plotAnnoPie(peakAnno)

test.tsv.zip TAIR10.release55.gtf.zip

HAOXUANmogu avatar Oct 23 '23 05:10 HAOXUANmogu

Thank you for reaching out! It seems that there is something wrong with your sample test.tsv file.

image

and there will be bug when running your code at the peak <-readPeakFile("test.tsv") , which come from the wrong format of tsv

image

MingLi-929 avatar Oct 23 '23 11:10 MingLi-929

Ok, I should move the first lane to the last, please try the new one, I have just tried the new form, it is working testnew.tsv.zip

HAOXUANmogu avatar Oct 23 '23 15:10 HAOXUANmogu

Thank you for your feed back! There is still something wrong with your file, and i correct it for you according to my understandings. Please check whether if this file can represent your information. i correct the format according to standard of bed file(https://genome.ucsc.edu/FAQ/FAQformat.html#format1) image test.bed.txt

It would be helpful to me if you can provide me some information about your file. It seems that it is an output of methylation ? But it is a little different from the regular methylation out. If it is something like methylation sequencing having peak of one base, the file should be like image

Since ChIPseeker analysis data based on the data structure of bed file, a correct input based on your actual need is important.

MingLi-929 avatar Oct 24 '23 03:10 MingLi-929

Yes, it is an output of methylation, this is just a demo of the input file, a form like I need to use, it is not the real output data, you can adjust it to any format you need, and I can follow you to adjust my data/

HAOXUANmogu avatar Oct 25 '23 19:10 HAOXUANmogu

I have tried your bed file, you have moved the strand to the sixth lane, but it still not working, it still show"*"

This is the annotated form I got:

anno_test.bed.txt

HAOXUANmogu avatar Oct 25 '23 22:10 HAOXUANmogu

Thank you for your feedback! For question you mention, the meaning of genomicAnnotationPriority is that a region can only have one annotation according to your need, which means that it can only be 5'UTR or exon. You can check other annotation in this way.

peakAnno <-annotatePeak(peak, tssRegion=c(-3000,3000),TxDb = tair_10,
                        assignGenomicAnnotation = TRUE,
                        genomicAnnotationPriority = c("Exon", "Intron", "3UTR", "5UTR", "Promoter", "Downstream", "Intergenic"),
                        annoDb = NULL,
                        addFlankGeneInfo = FALSE,
                        flankDistance = 5000,
                        sameStrand = FALSE,
                        #ignoreOverlap = FALSE,
                        #ignoreUpstream = FALSE,
                        #ignoreDownstream = FALSE,
                        overlap = "all",
                        verbose = TRUE)

detail <- peakAnno@detailGenomicAnnotation
table(detail$fiveUTR)
#r$> table(detail$fiveUTR)
#
#FALSE  TRUE 
#15392  1164 

And for the strand information, we will update the function in the near future. you can try to add strand information using

# df is the data.frame obtained from bed file
# column x is the column containing strand information
strand(peak) <- df[,x]

and the you can perform your analysis with strand information. sameStrand will work.

MingLi-929 avatar Oct 31 '23 09:10 MingLi-929

Hello, I have two questions. Question 1: I have output the result, and I used SameSrand=TRUE, which mostly works. My bed file is full of positive chains, but there are a few positive chains annotated to the promoter of the negative chain gene. Why is this? I used SameSrand=TRUE, However, there are still some positive chain coordinates annotated to the negative chain gene promoter, simply because the positive chain coordinates are relatively close to the negative chain promoter.What should I do? Question 2: I have a total of 850k coordinates, all of which are positive chains, but I have annotated 800k. How can the remaining 50k be displayed? Please help me, thank you.

annotatePeak( peak, tssRegion = c(-2000, 100), TxDb = txdb, level = "transcript", assignGenomicAnnotation = TRUE, genomicAnnotationPriority = c("Promoter", "5UTR", "3UTR", "Exon", "Intron", "Downstream", "Intergenic"), annoDb = "org.Dr.eg.db", addFlankGeneInfo = FALSE, flankDistance = 5000, sameStrand = TRUE, ignoreOverlap = FALSE, ignoreUpstream = FALSE, ignoreDownstream = FALSE, overlap = "all", verbose = TRUE, columns = c("ENTREZID", "ENSEMBL", "SYMBOL", "GENENAME") )

done... Annotated peaks generated by ChIPseeker 822538/858148 peaks were annotated Genomic Annotation Summary:

helloworldABCD1234 avatar Aug 11 '25 13:08 helloworldABCD1234