ChIPpeakAnno overlap peaks with TSS returns more than TSS overlap
1
0
Entering edit mode
94133 • 0
@94133-14305
Last seen 4.4 years ago
USA, Stanford

I want ChIP peaks that overlap gene TSSs. However, output from ChIPpeakAnno returns peaks that do not overlap, which requires extra filtering. Is there a better way?

ChIP_peaks_annoTSS <- annotatePeakInBatch(res1_ChIP,
                                            AnnotationData = genes(TxDb.Mmusculus.UCSC.mm10.knownGene),
                                            output = "overlapping",
                                            featureType = "TSS",
                                            select = "all",
                                            ignore.strand = TRUE,
                                            FeatureLocForDistance = "TSS")
ChIP_peaks_annoTSS <- addGeneIDs(annotatedPeak=ChIP_peaks_annoTSS,
                                   orgAnn = "org.Mm.eg.db",
                                   feature_id_type = "entrez_id",
                                   IDs2Add = "symbol") %>% as.data.frame()

fromOverlappingOrNearest column = Overlapping, when insideFeature shows inside or overlapEnd, which is NOT TSS.

So then, I filter from insideFeature column to get TSS overlaps, like:

TSSpatterns = c("overlapStart","includeFeature")
ChIP_peaks_annoTSS <- filter(ChIP_peaks_annoTSS, grepl(paste(TSSpatterns, collapse="|"), insideFeature))
ChIP_peaks_annoTSS_cond <- condenseMatrixByColnames(as.matrix(as.data.frame(ChIP_peaks_annoTSS)), "peak")

Can you show me the proper way?

Thanks!!!!

 

 

chippeakanno chipseq R • 1.8k views
ADD COMMENT
0
Entering edit mode

Could you please try the following code and see if that meets your need? Thanks!

tss <- promoters(TxDb.Mmusculus.UCSC.mm10.knownGene, upstream=0, downstream=1)

ChIP_peaks_annoTSS <- annotatePeakInBatch(res1_ChIP,
                                            AnnotationData = tss,
                                            output = "overlapping",
                                            featureType = "TSS",
                                            select = "all",
                                            ignore.strand = TRUE,
                                            FeatureLocForDistance = "TSS")

Best regards,

Julie

ADD REPLY
0
Entering edit mode
Ou, Jianhong ★ 1.3k
@ou-jianhong-4539
Last seen 20 hours ago
United States

Did you tried to set output = "upstream"?

ADD COMMENT
0
Entering edit mode

No. Are you suggesting this is the best way to do this? I don't understand why one would use upstream for TSS overlap, can you explain?

Thanks!

ADD REPLY
0
Entering edit mode

This will find the peaks overlap with the TSS because we set the maxgap=-1 and FeatureLocForDistance="TSS". 

However, maybe this is not the answer of your biological question. Maybe you are asking to find the annotation for promoter region? If that is the case, please try to use set output="overlapping", FeatureLocForDistance="TSS" and bindingRegion = c(-5000, 3000). Here the bindingRegion means upstream 5K and downstream 3K of TSS.

ADD REPLY
0
Entering edit mode

I tried your suggestion like this but get an error:

ChIP_peaks_annoTSS <- annotatePeakInBatch(res1_ChIP,

                                            AnnotationData = genes(TxDb.Mmusculus.UCSC.mm10.knownGene),
                                            output = "overlapping",
                                            featureType = "TSS",
                                            select = "all",
                                            ignore.strand = TRUE,
                                            FeatureLocForDistance = "TSS",
                                            bindingRegion = c(-2000, 2000))
ChIP_peaks_annoTSS <- addGeneIDs(annotatedPeak=ChIP_peaks_annoTSS,
                                   orgAnn = "org.Mm.eg.db",
                                   feature_id_type = "entrez_id",
                                   IDs2Add = "symbol") 
ChIP_peaks_annoTSS_cond <- condenseMatrixByColnames(as.matrix(as.data.frame(ChIP_peaks_annoTSS)), "peak")

Error in data.frame(seqnames = as.factor(seqnames(x)), start = start(x),  : 
  duplicate row.names: X12, X39, X45, X52, X67, X71, X137, X144, X179, X184, X215, X228, X232, X240, X244, X246, X255, X262, X265, X284, X287, X379, X384, X391, X393, X404, X420, X451, X533, X534, X536, X553, X556, X574, X575, X60 ... ... ... 

ADD REPLY
0
Entering edit mode

try:

ChIP_peaks_annoTSS_cond <- condenseMatrixByColnames(as.matrix(as.data.frame(unname(ChIP_peaks_annoTSS))), "peak")

ADD REPLY
0
Entering edit mode

That works, thanks! 

ADD REPLY

Login before adding your answer.

Traffic: 775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6