Hello everyone,
I am new to data analysis and having trouble while trying to analyze rna-seq data using limma package. My RNA-seq data has 3 conditions(ifected lung,infected spleen,non infected spleen) and I am doing 2 things. Firstly I perform anova and then I define contrast and try to get the differentially expressed genes in the contrast. The problem is that I am getting very large fold change values. I am not sure what I am doing wrong. Please help.
Code:
design <- model.matrix(~0+State,data=pheno)
design
colnames(design)<-c("influng","infspleen","noninfspleen")
fit <- lmFit(data_normalized[variable_genes,],design)
fit <- eBayes(fit)
#define contrast
cont.matrix <- makeContrasts(infspleen.vs.noninfspleen=infspleen-noninfspleen,
influng.vs.noninfspleen=influng-noninfspleen,
influng.vs.infspleen=influng-infspleen,
levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#ANOVA
anova_result<-topTable(fit2,number=Inf)
top_var_genes<-topTable(fit2,number=1000,sort.by="p") #get 1000 most variable genes
head(topTable(fit2,coef = 3,adjust="fdr",number=Inf) ) #coef= 1 inf spleen vs non inf spleen
#coef=2 inf lung vs non inf spleen
#coef=3 inf lung vs inf spleen
##################################################################################################
output:
logFC AveExpr t P.Value adj.P.Val B Ccr5 -23659.522 8989.1708 -156.95385 2.800465e-13 3.100115e-09 -4.333076 Anpep -1974.974 688.7282 -137.83693 6.728328e-13 3.724130e-09 -4.333102 Htr7 -1058.050 393.6362 -101.96705 5.143171e-12 1.897830e-08 -4.333194 Mmp14 -4817.820 1745.7760 -72.20709 5.274585e-11 1.459741e-07 -4.333395 Socs3 -19469.569 7413.1870 -57.02656 2.589051e-10 5.231674e-07 -4.333639 Rin2 -9375.873 4753.0032 -56.26191 2.835596e-10 5.231674e-07 -4.333657
> sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] pheatmap_1.0.8 data.table_1.9.6 DESeq2_1.12.3 [4] rgl_0.95.1441 corrplot_0.77 RColorBrewer_1.1-2 [7] ffpe_1.16.0 TTR_0.23-1 RUVSeq_1.6.2 [10] edgeR_3.14.0 limma_3.28.14 EDASeq_2.6.2 [13] ShortRead_1.30.0 GenomicAlignments_1.8.4 Rsamtools_1.24.0 [16] Biostrings_2.40.2 XVector_0.12.0 BiocParallel_1.6.2 [19] RSkittleBrewer_1.1 zebrafishRNASeq_0.106.2 BiocInstaller_1.22.3 [22] scales_0.4.0 ggplot2_2.1.0 SummarizedExperiment_1.2.3 [25] Biobase_2.32.0 GenomicRanges_1.24.2 GenomeInfoDb_1.8.2 [28] IRanges_2.6.1 S4Vectors_0.10.1 BiocGenerics_0.18.0 [31] sva_3.20.0 genefilter_1.54.2 mgcv_1.8-12 [34] nlme_3.1-128
I am actually normalizing the data using deseq2 and then filtering out genes which do not have a max expression of 10 in atleast one of the conditions. should I be normalizing the data only using limma?
Why are you making things difficult for yourself? Just follow the
voom
-based workflow in Chapter 18 of the limma user's guide. There's no good reason to do more complicated things, especially if you're new to RNA-seq data analysis.ok. Thank you.