Different results between Limma and DESEQ2 for Differential Expression Analysis
1
0
Entering edit mode
Bine ▴ 50
@bine-23912
Last seen 8 months ago
UK

Dear all,

I am a bit frustrated right now. I wanted to double check my DESEQ2 Differential Expression (RNA SEQ) results in Limma and now I am getting different results. I dont know what I did differently that the results are so different. I am here interested in the difference in Gender: Male vs. Female. Please find below the relevant part of the two different approaches, if you can see any obvious difference I would be very happy:

DESEQ2:

dds0 <- DESeqDataSetFromMatrix(countData = cts,
                               colData = colData,
                               design = ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender)

......

# Second approach for filtering
### Set thresholds
padj.cutoff <- 0.05

lfc.cutoff <- 1.2

Limma:

design = model.matrix( ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender, metadata)

fitDupCor <- lmFit(geneExpr, design)

# Fit Empirical Bayes for moderated t-statistics
fitDupCor <- eBayes( fitDupCor )

...

result <- topTable(fitDupCor, number = 50, adjust = "BH", p.value = 0.05, lfc = 1.2, coef = "GenderM")

I used raw counts for both analysis, maybe that was a mistake?

Thank you all!!!

DESeq2 limma • 2.3k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 10 hours ago
WEHI, Melbourne, Australia

There are a number of guides explaining how to do analyses in limma. You need to do all the steps, including normalization and filtering, not just the steps you have shown in your question.

I have already advised you on Biostars not to use lfc=1.2.

ADD COMMENT
0
Entering edit mode

Thank you. I did normalisation and filtering. I tried to show only the main analysis here. I cant find a clear answer whether I should also use raw counts for Limma as I did for DESEQ2. Do you have an opinion on that?

Thanks a lot!

ADD REPLY
0
Entering edit mode

I don't really understand your question. What would you be analysing if not the counts?

We can't help you if you don't show the complete analysis you did from start to finish. The whole analysis is just a few lines, so I don't quite understand why you are only showing snippets of the code in your question.

ADD REPLY
0
Entering edit mode

Thank you.

Sorry I think I am a bit confused.

I agree I should be more clear and show the complete analysis.

Input Limma & DESEQ2:

raw counts

Filtering for Limma & DESEQ2:

So what I did for both models is that I followed the GTEx Pipeline for the filtering: A gene was kept if it has a transcript per million (TPM) ≥ 0.1 and a raw read count ≥ 6 in ≥ 20% samples.

Normalisation for Limma and DESEQ2:

Limma:

Following https://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf I used logCPM with limma trend ("In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million (logCPM) and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. The precision weights approach is called “voom” and the prior trend approach is called “limma-trend”).

DESEQ:

For DESEQ2 I understand the normalisation is done automatically by the DeSeq() function. It performs the median of ratios method of normalization (https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html).

Differential Expression Analysis:

DESEQ2:

dds0 <- DESeqDataSetFromMatrix(countData = cts,
                               colData = colData,
                               design = ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender)

### Set thresholds
padj.cutoff <- 0.05

lfc.cutoff <- 1.2

Limma:

design = model.matrix( ~ batch + AGE.GROUP + Sample.Site + BMI.Tertile + Diagnosis +  Gender, metadata)

 logCPM <- cpm(dge, log=TRUE, prior.count=3)

 fit <- lmFit(logCPM, design)

 fit <- eBayes(fit, trend=TRUE)

result <- topTable(fitDupCor, number = 50, adjust = "BH", p.value = 0.05, lfc = 1.2, coef = "GenderM")

Please just ignore the lfc=1.2 for now, I will change it.

Thank you very much!

ADD REPLY
0
Entering edit mode

You are using two different algorithms. You should expect them to be close, but you can't expect them to be identical.

ADD REPLY
0
Entering edit mode

Thank you for checking. Yes, I agree, but e.g. 2 genes were clearly significant in one algorithm and arent at all in the other.. I think I should only see small differences or? Gordon Smyth @swbarnes2

ADD REPLY
1
Entering edit mode

Well, plot the log CPM and the normalized counts of each, just to eyeball what's happening. Maybe that finding is correct for those different normalizations.

ADD REPLY

Login before adding your answer.

Traffic: 802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6