I am using ChIPpeakAnno package, after calculating the motif of the fasta sequences of the peaks I called, I realize there are two pfms in the results, one is of 6 bases, while the other is of 8 bases. In tutorial, the pipeline just dropped the second pfm. I am confused since I set the parameter "oligoLength" as 6, why there is another pfm with 8 bases numbers? Really hope someone brilliant can explain it to me, many thanks!
setwd('E:\\Project')
library(ChIPpeakAnno)
library(tidyverse)
library(EnsDb.Hsapiens.v86)
annoData <- toGRanges(EnsDb.Hsapiens.v86, feature = "gene")
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(UpSetR)
library(org.Hs.eg.db)
library(clusterProfiler)
library(ggplot2)
library(BSgenome.Hsapiens.UCSC.hg38)
library(seqinr)
library(motifStack)
test <- read_delim('DMSO-K4_peaks.broadPeak', col_names = F)
test_clean <- d4[grep("^chr", test$X1), ]
write_delim(test_clean, 'test_clean.broadPeak', col_names = F)
tcp <- toGRanges('test_clean.broadPeak', format="broadPeak", header=F)
seq <- getAllPeakSequence(tcp, upstream=20, downstream=20, genome=Hsapiens)
write2FASTA(seq, "seq.fa")
os <- oligoSummary(seq, oligoLength=6, MarkovOrder=3,
quickMotif=T)
there are two list elements in d4os$motifs Here are the following steps that are recommended by tutorial, you can see that the tutorial is only choosing the first pfm. So after I set the "oligoLength" to 6, why there is a second pfm?
pfm <- mapply(function(.ele, id)
new("pfm", mat=.ele, name=paste("SAMPLE motif", id)),
os$motifs, 1:length(os$motifs))
motifStack(pfm[[1]])