How to calculate the TSS enrichment score according to the ENCODE standard https://www.encodeproject.org/data-standards/terms/#enrichment?
I was reading the source code https://github.com/jianhong/ATACseqQC/blob/master/R/TSSEscore.R#L80 but think it is not doing it correctly.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
sel.center <- promoters(txs, upstream = 1000 downstream = 1000)
sel.left.flank <- flank(sel.center, width=endFlank, both=FALSE)
sel.right.flank <- flank(sel.center, width=endFlank, start=FALSE, both = FALSE)
cvg<-cov(gr)
sel.center<- as(sel.center, "IntegerRangesList")
vws.center <- Views(cvg, sel.center)
vws.center$chr9
Views on a 140500037-length Rle subject
views:
start end width
[1] 10987 12986 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2] 71698 73697 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[3] 213865 215864 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[4] 213865 215864 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[5] 272048 274047 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[6] 272048 274047 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[7] 272048 274047 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[8] 272048 274047 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[9] 364591 366590 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
... ... ... ... ...
[2420] 140458452 140460451 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2421] 140472388 140474387 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2422] 140458607 140460606 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2423] 140472388 140474387 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2424] 140472388 140474387 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2425] 140472374 140474373 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2426] 140483938 140485937 2000 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...]
[2427] 140512309 140514308 2000 [ ]
[2428] 140786023 140788022 2000 [ ]
> vws.center$chr9[[2428]]
Error in getListElement(x, i, ...) : view is out of limits
> vws.center$chr9[[2426]]
integer-Rle of length 2000 with 3 runs
Lengths: 969 78 953
Values : 0 1 0
> vws.center$chr9[[2427]]
Error in getListElement(x, i, ...) : view is out of limits
> vws.center$chr9[[2425]]
integer-Rle of length 2000 with 7 runs
Lengths: 782 186 11 13 7 173 828
Values : 0 1 0 1 2 1 0
I want to get the matrix out and better to keep the tx id for each row (how should I do it?) but this gives me an error:
sapply(1:length(vws.center$chr9), function(i) as.numeric(vws.center$chr9[[i]]))
Error in getListElement(x, i, ...) : view is out of limits
essentially, I want a matrix of #tss x 2000, 2kb around the tss and each value is the number of read counts overlap with that base. (I also want to flip the rows of the matrix when the strand is -)
BTW, I have read https://support.bioconductor.org/p/56390/ very old thread and https://stat.ethz.ch/pipermail/bioc-devel/2016-June/009446.html
Thanks very much.
or if there is any package there for this purpose, I want to take a look at the source code and learn.
genomation has some functions https://bioconductor.org/packages/devel/bioc/vignettes/genomation/inst/doc/GenomationManual.html I am reading the source code. Thanks.