Question

Different Expression Analysis with EDASeq,DESeq2,edgeR,RUVSeq

0

Entering edit mode

Yang WY • 0

@yang-wy-21745

Last seen 5.5 years ago

China Beijing

Hi,

This is my first post here and I'm new to RNA-Seq data analysis, especially different expression analysis (DEA).

I have read some surveys about "the best practise in RNA-Seq analysis". As they said, when working on DEA, first we should normalize the raw counts on GC content, batch effects, library size, other unwanted tecnical variances (using RUVSeq), etc, then we could use DEA algorithms to estimate dispersion, fit the model and calculate LogFC and P-adjust.

What make me confused are:

1、I want to do GC content normalization before DESeq2 or EdgeR pipeline, Here's my source using DESeq2:

 counts_GC <- withinLaneNormalization(raw_count_matrix, "GC", which="full", offset=T)
 counts_DESeq<-DESeqDataSetFromMatrix(countData = counts_GC@assayData$normalizedCounts, colData = 
                                      myfactor,design=~myfactor)
 DESeq2_DEA<-DESeq(counts_DESeq,parallel = T)
 res<-results(DESeq2_DEA)

As you see, I extracted the normalized counts created by withinLaneNormalization() and put it in DESeq2 pipeline, the same I did in edgeR.

While, the user's guide suggest that both DESeq2 and edgeR expect un-normalized counts, if user need EDASeq or cqn normalization before, they should set "offset" like:

EDASeqNormFactors<-exp(-EDASeqoffset)  #DESeq2
disp<-estimateDisp(counts(dataOffset),design, offset=-offst(dataOffset))   #edgeR

Does it means I can only use "offset" to pass EDASeq's results? Is my former source code incorrect?

2、 When I look for some examples using EDASeq+DESeq2 or edgeR, like https://support.bioconductor.org/p/108765/ and https://support.bioconductor.org/p/113405/ , both of them use "offset" generated from EDASeq as normalization factor, I notice that they didn't use the normalize method in classical pipeline such as Median in DESeq2 (as part of DESeq() do ) or TMM in edgeR (as calcNormFactors() do) anymore. If I use "offset", can Median or TMM works? I wrote a new source below, Is it the only way to normalize on GC content and library size before DESeq2 or EdgeR pipeline?

library(DESeq2)
# data is data from EDAseq example
dataOffset <- withinLaneNormalization(data,"gc",which="full",offset=TRUE)
dataOffset <- betweenLaneNormalization(dataOffset,which="full",offset=TRUE)


dds <- DESeqDataSetFromMatrix(countData = dataOffset@assayData$normalizedCounts,
                              colData = pData(dataOffset),
                              design = ~ conditions)


EDASeqNormFactors <- exp(-1 * offst(dataOffset))
normalizationFactors(dds) <- EDASeqNormFactors
# dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
res <- results(dds)
res <- res[order(res$pvalue),]
head(res)

3、According to the user's guide of RUVSeq, I first find "Empirical control genes" by edgeR and run RUVg(), I noticed that in the guide, the "set" in RUVg() has been normalized before:

set <- betweenLaneNormalization(set, which="upper")

,after finding "Empirical control genes", it goes:

set2 <- RUVg(set, empirical, k=1)

It treated the normalizaed "set" as input of RUVg(). Can I run RUVg without set normalization? Will it have a bad impact on results?

I appreciate your help，

Yang

deseq2 normalization RUVg edger • 1.8k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by Yang WY • 0

score 0 · Answer 1 · 2019-08-30

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

My current recommendation for DESeq2 is to use Salmon with GC bias correction followed by tximport and DESeqDataSetFromTximport.

This pipeline corrects for a number of important biases including sample-specific GC bias and changes to gene length from any potential differential isoform usage (if this occurs).

ADD COMMENT • link 5.5 years ago Michael Love 43k