Question

Large foldchange values while analyzing rna seq data using limma

0

Entering edit mode

szenitha ▴ 20

@szenitha-10863

Last seen 8.4 years ago

Hello everyone,

I am new to data analysis and having trouble while trying to analyze rna-seq data using limma package. My RNA-seq data has 3 conditions(ifected lung,infected spleen,non infected spleen) and I am doing 2 things. Firstly I perform anova and then I define contrast and try to get the differentially expressed genes in the contrast. The problem is that I am getting very large fold change values. I am not sure what I am doing wrong. Please help.

Code:

design <- model.matrix(~0+State,data=pheno)
design
colnames(design)<-c("influng","infspleen","noninfspleen")

fit <- lmFit(data_normalized[variable_genes,],design)
fit <- eBayes(fit)

#define contrast

cont.matrix <- makeContrasts(infspleen.vs.noninfspleen=infspleen-noninfspleen,
       influng.vs.noninfspleen=influng-noninfspleen,
       influng.vs.infspleen=influng-infspleen,
            levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#ANOVA

anova_result<-topTable(fit2,number=Inf)
top_var_genes<-topTable(fit2,number=1000,sort.by="p") #get 1000 most variable genes

head(topTable(fit2,coef = 3,adjust="fdr",number=Inf) )        #coef= 1 inf spleen vs non inf spleen
                                                                          #coef=2 inf lung vs non inf spleen
                                                                          #coef=3 inf lung vs inf spleen

##################################################################################################

output:

         logFC   AveExpr          t      P.Value    adj.P.Val         B
Ccr5  -23659.522 8989.1708 -156.95385 2.800465e-13 3.100115e-09 -4.333076
Anpep  -1974.974  688.7282 -137.83693 6.728328e-13 3.724130e-09 -4.333102
Htr7   -1058.050  393.6362 -101.96705 5.143171e-12 1.897830e-08 -4.333194
Mmp14  -4817.820 1745.7760  -72.20709 5.274585e-11 1.459741e-07 -4.333395
Socs3 -19469.569 7413.1870  -57.02656 2.589051e-10 5.231674e-07 -4.333639
Rin2   -9375.873 4753.0032  -56.26191 2.835596e-10 5.231674e-07 -4.333657

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             data.table_1.9.6           DESeq2_1.12.3             
 [4] rgl_0.95.1441              corrplot_0.77              RColorBrewer_1.1-2        
 [7] ffpe_1.16.0                TTR_0.23-1                 RUVSeq_1.6.2              
[10] edgeR_3.14.0               limma_3.28.14              EDASeq_2.6.2              
[13] ShortRead_1.30.0           GenomicAlignments_1.8.4    Rsamtools_1.24.0          
[16] Biostrings_2.40.2          XVector_0.12.0             BiocParallel_1.6.2        
[19] RSkittleBrewer_1.1         zebrafishRNASeq_0.106.2    BiocInstaller_1.22.3      
[22] scales_0.4.0               ggplot2_2.1.0              SummarizedExperiment_1.2.3
[25] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.2        
[28] IRanges_2.6.1              S4Vectors_0.10.1           BiocGenerics_0.18.0       
[31] sva_3.20.0                 genefilter_1.54.2          mgcv_1.8-12               
[34] nlme_3.1-128

limma logfoldchange • 1.5k views

ADD COMMENT • link updated 8.7 years ago by Aaron Lun ★ 28k • written 8.7 years ago by szenitha ▴ 20

score 2 · Answer 1 · 2016-08-04

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 7 hours ago

The city by the bay

I would guess that you haven't log-transformed your expression values. limma assumes that the values are log-transformed, see ?lmFit. Also, I don't know what your variable_genes represents, but filtering to keep the most variable genes is not recommended - see Section 7 of the limma user's guide for details.

Edit: Just realized you're working with RNA-seq data; in which case, you should be using voom on your original gene-level counts to get log-transformed intensities for use in limma. If you don't have counts, then your options are limited - see Gordon's answer at A: Differential expression of RNA-seq data using limma and voom().

ADD COMMENT • link 8.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

I am actually normalizing the data using deseq2 and then filtering out genes which do not have a max expression of 10 in atleast one of the conditions. should I be normalizing the data only using limma?

ADD REPLY • link 8.7 years ago szenitha ▴ 20

0

Entering edit mode

Why are you making things difficult for yourself? Just follow the voom-based workflow in Chapter 18 of the limma user's guide. There's no good reason to do more complicated things, especially if you're new to RNA-seq data analysis.