Entering edit mode
Wu, Xiwei
▴
350
@wu-xiwei-1102
Last seen 10.2 years ago
Dear list,
I have a smallRNA sequencing data from two samples, without
replicates.
I tried using DESeq to identify differentially expressed gene
following
the manual, and the code is as below:
> library(DESeq)
> conds <- c("NSC", "TSC")
> head(combined_data)
NSC TSC
hsa-let-7a 906684 413336
hsa-let-7a* 2 1281
hsa-let-7a-2* 51 132
hsa-let-7b 179751 63079
hsa-let-7b* 46 684
hsa-let-7c 380745 50729
> apply(combined_data, 2, sum)
NSC TSC
5018095 8658429
> cds <- newCountDataSet(combined_data, conds)
> cds <- estimateSizeFactors( cds )
> sizeFactors(cds)
NSC TSC
1 1
> cds <- estimateVarianceFunctions(cds, pool=T)
> res <- nbinomTest(cds, "NSC", "TSC")
> head(res[order(res$pval),])
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj resVarA resVarB
397 hsa-miR-299-5p 329.5 329 330 1.0030395
0.004378441 0.002408825 0.9236618 NA NA
131 hsa-miR-1285 407.0 406 408 1.0049261
0.007089425 0.002922980 0.9236618 NA NA
196 hsa-miR-146a 170.5 170 171 1.0058824
0.008461579 0.004641615 0.9696484 NA NA
525 hsa-miR-425* 180.5 178 183 1.0280899
0.039966407 0.008776619 0.9696484 NA NA
826 hsa-miR-675* 714.0 722 706 0.9778393
-0.032330654 0.008892177 0.9696484 NA NA
505 hsa-miR-379* 260.5 256 265 1.0351562
0.049848549 0.012192131 0.9696484 NA NA
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] preprocessCore_1.10.0 limma_3.4.5 DESeq_1.0.6
locfit_1.5-6 lattice_0.18-8
[6] akima_0.5-4 Biobase_2.8.0
loaded via a namespace (and not attached):
[1] annotate_1.26.1 AnnotationDbi_1.10.2 DBI_0.2-5
genefilter_1.30.0 geneplotter_1.26.0
[6] grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-2
splines_2.11.1 survival_2.35-8
[11] tools_2.11.1 xtable_1.5-6
I have a few questions here:
1. The sizeFactors were estimated as 1, but the actual total number of
reads in the two samples differ 60%, could it be correct?
2. It seems that the identified top candidates with the smallest p
value
are all with very small fold changes. I have seen huge difference
between the two samples for some miRNA, but the P value was close to
1;
3. The resVarA and resVarB are NA for all genes, did I do something
wrong when calling the estimateVarianceFunctions?
4. For a RNA or small RNA sequencing experiment with only two samples,
what is the best way to normalize data?
Any help will be greatly appreciated.
Xiwei Wu, Ph.D.
Bioinformatics Core
Assistant Research Professor
Department of Molecular Medicine
Beckman Research Institute
City of Hope
x.65071
---------------------------------------------------------------------
SECURITY/CONFIDENTIALITY WARNING: \ This message and
...{{dropped:30}}