Entering edit mode
Darya Vanichkina
▴
130
@darya-vanichkina-6050
Last seen 7.9 years ago
Australia/Centenary Institute Universit…
Hi!
I'm doing a differential gene expression analysis on two time points
for a human iPS cell line, using 100bp PE RNA-seq data mapped with
tophat and aggregated to generate a count table using htseq-count in
intersection-strict mode. While the differentiation process changes
the phenotype of the cell line significantly, I'm still not sure I
would expect >15% of genes in the genome to change significantly.
I'm especially worried about the fact that after I filter for lowly
expressed genes, ~50% of those for which testing is carried out are
called significant. Naturally, I can filter based on
cpm/logFC/detection by both tools before coming to a "significant" DG
table, but I'm wondering if there is something intrinsically wrong
with my data/samples, and if so - what? Or am I just making a very
silly mistake in my code somewhere?
Thanks in advance for your help,
Darya Vanichkina
edgeR BCV plot (Disp = 0.0189 , BCV = 0.1375 )
http://i.imgur.com/Pf32qeBh.png
edgeR smearplot
http://i.imgur.com/bs6JyoEh.png
DESeq MAplot
http://i.imgur.com/Ds9nfwth.png
## Sample code used
# edgeR -----
row.names(counttable.sam) <- counttable.sam$Gene
counttable.sam$Gene <- NULL
y <- DGEList(counts=counttable.sam[1:6], group=rep(1:2,each=3),
genes=row.names(counttable.sam))
keep <- rowSums(cpm(y)>1) >= 1
y <- y[keep,]
dim(y)
# 16002 6
# Kept, out of 62890 genes in reference (gencode 17). Reads were
counted to gene features using htseq-count in intersection-strict mode
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
plotMDS(y)
y <- estimateCommonDisp(y, verbose=TRUE)
# Disp = 0.0189 , BCV = 0.1375
y <- estimateTagwiseDisp(y)
plotBCV(y)
et <- exactTest(y)
summary(de <- decideTestsDGE(et))
# Out of 16002 tags for which the counts are not too low, 6263 are
upregulated and 6235 are down-regulated
detags <- rownames(y)[as.logical(de)]
plotSmear(et, de.tags=detags)
abline(h=c(-1, 1), col="blue")
# DESeq -----
# Filter the lowly expressed tags
rs <- rowSums (counttable.sam)
use <- (rs > 10)
table(use)
# use
# FALSE TRUE
# 41704 21186
counttable.sam <- counttable.sam[ use, ]
conds <- factor(c("day0", "day0", "day0", "day34", "day34", "day34"))
d <- newCountDataSet(counttable.sam, conds)
d <- estimateSizeFactors(d)
sizeFactors(d)
# d0_1 d0_2 d0_3 d34_1 d34_2 d34_3
# 2.0770099 1.4882383 0.7514035 0.4643261 1.3976540 0.6846997
d <- estimateDispersions(d, sharingMode="maximum")
plotDispEsts(d)
DESeqRes <- nbinomTest(d, "day0", "day34")
DESeqRes$GeneShort <- substr(DESeqRes$id, 1, 15)
DESeqResSig <- subset(DESeqRes, padj < 0.01)
# 10163/21186 genes were called significant by DESeq ...
plotMA(DESeqRes)
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq_1.12.0 lattice_0.20-15 locfit_1.5-9.1
Biobase_2.20.1
[5] BiocGenerics_0.6.0 ggplot2_0.9.3.1 BiocInstaller_1.10.2
edgeR_3.2.4
[9] limma_3.16.6 biomaRt_2.16.0
loaded via a namespace (and not attached):
[1] annotate_1.38.0 AnnotationDbi_1.22.6 colorspace_1.2-2
DBI_0.2-7
[5] dichromat_2.0-0 digest_0.6.3 genefilter_1.42.0
geneplotter_1.38.0
[9] grid_3.0.1 gtable_0.1.2 IRanges_1.18.2
labeling_0.2
[13] MASS_7.3-27 munsell_0.4.2 plyr_1.8
proto_0.3-10
[17] RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2
RSQLite_0.11.4
[21] scales_0.2.3 splines_3.0.1 stats4_3.0.1
stringr_0.6.2
[25] survival_2.37-4 tools_3.0.1 XML_3.95-0.2
xtable_1.7-1
____________________
Darya Vanichkina
PhD Student in Bioinformatics
IMB Science Ambassador
Institute for Molecular Biosciences (IMB)
University of Queensland Building 80
St Lucia Queensland, 4072 Australia
d.vanichkina@gmail.com
d.vanichkina@imb.uq.edu.au
[[alternative HTML version deleted]]
Hi Darya,
I am working with something similar. Even I found about 49% of the ~20k protein Coding genes to be deferentially expressed. If you could share how you dealt with this or justified the observation.