Entering edit mode
Dear list,
I am interested in adjusting for GC bias in my experiment. I have my
data in an "SeqExpressionSet" object (using EDASeq library) called
eset
and I then run the following commands:
dataOffset = withinLaneNormalization(eset,"gc",
which="full",offset=TRUE)
dataOffset2 = betweenLaneNormalization(dataOffset,
which="full",offset=TRUE)
y = DGEList(exprs(dataOffset), group = pData(eset)$conditions)
y = calcNormFactors(y,method="TMM")
y$design = model.matrix(~y$samples$group)
y = estimateGLMCommonDisp(y, y$design, offset = -offst(dataOffset2))
y = estimateGLMTagwiseDisp(y, y$design, offset = -offst(dataOffset2))
#no trended
lrt_offset <- glmLRT(glmFit(y,y$design, offset = -offst(dataOffset2)),
coef=2)
Where group is something like c(1,1,2,2). However, I get very high
logCPM values
> head(lrt_offset$table)
logFC logCPM LR PValue
1 -2.087455 24.90139 18.59469 1.616704e-05
2 -2.182279 24.71384 21.99644 2.731570e-06
3 -1.962187 25.17329 14.93835 1.110818e-04
4 -1.808961 24.67918 13.09471 2.961312e-04
5 -2.189829 26.08874 22.87277 1.730868e-06
6 -1.816757 25.37329 15.85988 6.820944e-05
> summary(lrt_offset$table$logCPM)
Min. 1st Qu. Median Mean 3rd Qu. Max.
22.27 24.76 25.32 25.39 25.95 30.58
Whereas before the glm command, the logCPM values look more like what
I
am expecting
> summary(y$logCPM)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.032 4.068 4.716 4.778 5.369 10.260
Plotting y$logCPM andlrt_offset$table$logCPM however does not result
in
a straight line (but it has high correlation). I've noticed in the
manual for both EDASeq and cqn (two packages for GC bias correction)
the
logCPM values are also much higher (look under topTags output).
Could someone shed some insight into why these values are much higher
(statistically)? Computationally, I know that if offsets are not
provided, default offsets of log(lib.size) is used in the glmFit()
function. The code on the svn seems to have been modified quite
recently
so maybe this will be resolved in the next update.
On a related note, I want to adjust for GC bias (with lane
normalization) and also for sequencing depth (between lane
normalization) but I want to do the latter in a simple way (and not
using the betweenLaneNormalization function above), could I do
something
like this?
myOffset = -offst(dataOffset) + log(lib.size)
> head(offst(dataOffset), n=3)
s1 s2 y1 y2
1 0.1870279 0.1033789 0.1605521 0.09555714
2 -0.3311078 -0.2439545 -0.1688206 -0.23481517
3 0.4720900 0.2818505 0.6052292 0.63641814
>log(lib.size)
[1] 14.94330 14.68804 13.10804 13.59071
Where dataOffset was generated above by doing full quantile
normalization using the withinLaneNormalization function above. I was
thinking about this because edgeR automatically generates an offset
when
computing glmFit() to adjust for different library sizes (using
log(lib.size) as offset). Could I do something similar but also adjust
for GC biases and have normal looking logCPM values?
Lastly, what is the best way to get the 'adjusted' count values out of
edgeR? Is it by using the fitted values from glmFit() or using cpm()
on
the non-fitted values ? would the latter adjust for offset?
Thanks,
Ying Wu
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.0.8 limma_3.14.4 EDASeq_1.4.0
[4] R.oo_1.13.0 R.methodsS3_1.4.2 aroma.light_1.28.0
[7] matrixStats_0.6.2 ShortRead_1.16.4 latticeExtra_0.6-24
[10] RColorBrewer_1.0-5 Rsamtools_1.10.2 lattice_0.20-10
[13] Biostrings_2.26.3 GenomicRanges_1.10.7 IRanges_1.16.6
[16] Biobase_2.18.0 BiocGenerics_0.4.0 knitr_1.1
loaded via a namespace (and not attached):
[1] annotate_1.36.0 AnnotationDbi_1.20.6 bitops_1.0-5
[4] DBI_0.2-5 DESeq_1.10.1 digest_0.6.3
[7] evaluate_0.4.3 formatR_0.7 genefilter_1.40.0
[10] geneplotter_1.36.0 grid_2.15.0 hwriter_1.3
[13] markdown_0.5.4 parallel_2.15.0 RSQLite_0.11.2
[16] splines_2.15.0 stats4_2.15.0 stringr_0.6.2
[19] survival_2.37-4 tools_2.15.0 XML_3.95-0.2
[22] xtable_1.7-1 zlibbioc_1.4.0