I followed the ATACseqQC guide and ran the following:
tsse <- TSSEscore(gal1, txs)
summary(tsse$TSS.enrichment.score)
This outputted:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.9152 2.3269 Inf 9.6348 Inf 206
The Encode ATAC-seq Data Standards define the TSS enrichment score to be the center of the normalized distribution-- does this refer to the median value in the summary? The official Encode definition is:
Transcription Start Site (TSS) Enrichment Score - The TSS enrichment calculation is a signal to noise calculation. The reads around a reference set of TSSs are collected to form an aggregate distribution of reads centered on the TSSs and extending to 1000 bp in either direction (for a total of 2000bp). This distribution is then normalized by taking the average read depth in the 100 bps at each of the end flanks of the distribution (for a total of 200bp of averaged data) and calculating a fold change at each position over that average read depth. This means that the flanks should start at 1, and if there is high read signal at transcription start sites (highly open regions of the genome) there should be an increase in signal up to a peak in the middle. We take the signal value at the center of the distribution after this normalization as our TSS enrichment metric. Used to evaluate ATAC-seq.
Then my median of 2.3269 is bad, according to their current standards:
<5: Concerning
5-7: Acceptable
>7: Ideal
I'm confused, because even in the example in the ATACseqQC guide, their output is:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 2.227 4.856 Inf 16.722 Inf 3750
where the median is also in the "Concerning" range.
Am I misunderstanding? Should I be looking at the 3rd. Quartile value instead? If not, is there any way to raise the TSS enrichment score? Thank you!