Entering edit mode
Thomas Frederick Willems
▴
20
@thomas-frederick-willems-5527
Last seen 10.4 years ago
Hi Gordon,
I've been trying to use edgeR to analyze some mRNA-seq data. I've
begun looking at a data set where the cells are either unstimulated or
stimulated with one single treatment and I have three replicates for
each class. When I use the GLM framework with tagwise dispersion to
fit a model for the effects of the single treatment and plot the GOF,
it is distinctly non-linear. When I remove one replicate from each
class by looking at the replicate most distant from its counterparts
in the plotMDS graph, I obtain a plot that appears more linear but
still deviates from the diagonal.
I've attached these gof plot as well as the results from plotBCV for
the situation in which 3 replicates were used. I have a few questions
regarding this plot:
1) Do the combination of gof plots suggest that there are gross batch
differences between the replicates, or have you seen other RNA-seq
datasets with a fit similar to the first gof plot?
2) Is the fit sufficiently poor that any conclusions drawn from the
GLM coefficients are likely of little use?
3) In another dataset I'm analyzing, the line also lies below the y=x
plot but is linear and has the same slope. What sort of issue might
this suggest?
My sequencing depths for the samples, as given by samples.libsize,
are:
26693672 61963337 64385230 65093734 79193403 57877925
My code is as follows:
library(edgeR)
# File containing file names where sample counts can be read
targets <- read.delim("/home/twillems/Desktop/IL6-paired-
end/GLM/files.txt", stringsAsFactors = FALSE)
treatment <- c(0,0,0,1,1,1)
design <- model.matrix(~treatment)
y <- readDGE(targets)
colnames(y) <- y$samples[['description']]
min_counts_per_million <- 0.2
min_samples <- 3
y <- y[rowSums(cpm(y) > min_counts_per_million) >= nrow(y$samples), ]
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y,design)
plotBCV(y)
plotMDS(y)
fit1 <- glmFit(y, design)
g <- gof(fit1, plot=TRUE)
And the result of sessionInfo():
R version 2.14.1 (2011-12-22)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] edgeR_2.6.12 limma_3.10.3 rkward_0.5.6
loaded via a namespace (and not attached):
[1] tools_2.14.1
Thanks for your help
Thomas Willems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gof plot_all_3_rep.PNG
Type: image/png
Size: 36903 bytes
Desc: gof plot_all_3_rep.PNG
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20121019="" 9d926986="" attachment.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BCV plot_all_3_rep.PNG
Type: image/png
Size: 31803 bytes
Desc: BCV plot_all_3_rep.PNG
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20121019="" 9d926986="" attachment-0001.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gof plot_2_rep.PNG
Type: image/png
Size: 35610 bytes
Desc: gof plot_2_rep.PNG
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20121019="" 9d926986="" attachment-0002.png="">