Question

DESeq2: LRT or WALD guidance

1

Entering edit mode

morgan.sobol ▴ 10

@f4107e6a

Last seen 22 months ago

United States

Hello,

I am hoping to get some guidance on whether I should be choosing LRT vs Wald. I have done both, and do not see too much of a difference in the expression of genes/gene sets but I have also doubted if I am using Wald correctly because of setting the contrasts. Background, I have 5 sample types sorted by their fluorescent color (Red, Green, Red+Green, Red+Green+Blue, Control). I want to see differences in stress pathway expression between all of these colors , so my design = ~color.

dds_no = DESeqDataSetFromMatrix(countData=countData_no,
                             colData=colData_no,
                             design=~color)

The way I have always seen it, this is an all vs all comparison. I have struggled to grasp the all vs all comparison using Wald. From what I have seen on numerous other posts, you do this by setting contrasts?

1] "Intercept"                 "color_green_vs_control"    "color_greenred_vs_control"
[4] "color_red_vs_control"      "color_rgb_vs_control"

So does this mean I need to set additional contrasts with the results function, i.e "color_red_vs_green", etc. And if I had previously not done this and went forward with selecting significant genes and creating heatmaps without setting additional contrasts, that I was not truly looking at all vs all situations? My code without other contrasts:

res_no <- results(dds_no)

rld_no <- rlogTransformation(dds_no)

paths<-assay(rld_no)

df_path3 <- cbind(rownames(res_no), data.frame(res_no, row.names=NULL))

topTable3 <- as.data.frame(df_path3)

sigGeneList3 <- subset(topTable3, abs(log2FoldChange)>=1 & pvalue<=0.05)[,1]

topMatrix3 <- paths3[which(rownames(paths3) %in% sigGeneList3),]


topMatrixPATHS3 <- gsva(data.matrix(topMatrix3),
                       stress_Cao_etal_2017,
                       method="gsva",
                       min.sz=1,
                       max.sz=Inf,
                       kcdf="Gaussian",
                       mx.diff=TRUE,
                       verbose=TRUE)

Now I have read LRT can be used to analyze all levels of a factor at once, which sounds great and easy for me. So I tried:

dds_LRT = DESeq(dds_no,test="LRT", reduced=~1)

res_LRT <- results(dds_LRT)

rld_LRT<- rlogTransformation(dds_LRT)

pathsLRT<-assay(rld_LRT)

df_pathLRT <- cbind(rownames(res_LRT), data.frame(res_LRT, row.names=NULL))


topTableLRT <- as.data.frame(df_pathLRT)

sigGeneListLRT <- subset(topTableLRT,  padj<=0.05)[,1]

topMatrixLRT <- pathsLRT[which(rownames(pathsLRT) %in% sigGeneListLRT),]


topMatrixPATHSLRT <- gsva(data.matrix(topMatrixLRT),
                       stress_Cao_etal_2017,
                       method="gsva",
                       min.sz=1,
                       max.sz=Inf,
                       kcdf="Gaussian",
                       mx.diff=TRUE,
                       verbose=TRUE)

But now I am uncertain if LRT is right for me based on this quote from another post. "If you want to test comparisons of pairs of levels of disease, you should be using the Wald test (the standard DESeq() workflow), instead of the LRT. The LRT is for testing all levels of disease at once."

Long story short, should I use Wald or LRT?

Thank you very much in advance! Morgan

DESeq2 RNASeq DifferentialExpression • 10.0k views

ADD COMMENT • link updated 2.9 years ago by chm • 0 • written 3.5 years ago by morgan.sobol ▴ 10

score 2 · Answer 1 · 2021-08-26

2

Entering edit mode

ATpoint ★ 4.7k

@atpoint-13662

Last seen 1 day ago

Germany

The technical part of Wald vs LRT has been asked before, e.g. Difference in Wald and LRT test for two condition RNAseq data. and the LRT is also extensively described in the manual, see http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test

Probably in your case the Wald is preferred as you get shrunken fold changes for each contrast which is useful for rankings (e.g. GSEA input).

I have struggled to grasp the all vs all comparison using Wald

You need the contrasts argument of results for that as described in the vignette. You have 10 possible contrasts here:

unique_combn <- combn(c("green", "greenred", "red", "rgb", "control"), 2)

contrasts <- lapply(1:ncol(unique_combn), function(x){

  c("color", unique_combn[1,x], unique_combn[2,x])

})
names(contrasts) <- unlist(lapply(contrasts, function(x) paste(x[2], x[3], sep = "_vs_")))

contrasts

> contrasts
$green_vs_greenred
[1] "color"    "green"    "greenred"

$green_vs_red
[1] "color" "green" "red"  

$green_vs_rgb
[1] "color" "green" "rgb"  

$green_vs_control
[1] "color"   "green"   "control"

$greenred_vs_red
[1] "color"    "greenred" "red"     

$greenred_vs_rgb
[1] "color"    "greenred" "rgb"     

$greenred_vs_control
[1] "color"    "greenred" "control" 

$red_vs_rgb
[1] "color" "red"   "rgb"  

$red_vs_control
[1] "color"   "red"     "control"

$rgb_vs_control
[1] "color"   "rgb"     "control"

...so feed them individually into results as results(..., contrast=contrasts[["green_vs_greenred"]], ...) to get the results table per contrast. It is then on you how to proceed, e.g. collecting DEGs from all contrasts and putting them into a heatmap to visualize patterns.

ADD COMMENT • link 3.5 years ago ATpoint ★ 4.7k

0

Entering edit mode

so here it doesn't matter if i had already set a reference level I can go on and do the contrast ?

ADD REPLY • link 2.9 years ago chm • 0

2

Entering edit mode

No, contrasts do not care about the order in the design with regard to the reference level.

ADD REPLY • link 2.9 years ago ATpoint ★ 4.7k

0

Entering edit mode

The above code was really helpful for generating lots of pair which i earlier used do do manually.

and then after asking on stack-overflow how to pass all the contrast argument in form of list and run the analysis they showed me this way

contrasts <- combn( c("M0", "M1", "M2", "M3", "M4" ,"M5"), 
                    FUN = function(x) c("FAB", x),
                    m = 2, 
                    simplify = FALSE)


names(contrasts) <- sapply(
  combn( c("M0", "M1", "M2", "M3", "M4" ,"M5"), m = 2, simplify = FALSE),
  paste0, collapse = "_vs_")

contrasts
    resdata <- lapply(contrasts, function(x) results(dds, contrast=x))

Just one more question its more related to R coding which I can't or not able to do. I would like to use the same approach as above .

resOrdered <- res[order(res$padj),]
summary(resOrdered)
resOrdered <- resOrdered[complete.cases(resOrdered),]  #remove any rows with NA
dim(resOrdered)
normalised_count <- merge(as.data.frame(resOrdered), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)

The above code I use in order to add normalized count to my table where I have all these that is listed below in my resdata which is

Symbol    baseMean log2FoldChange     lfcSE      stat       pvalue         padj

To that I want to append the normalized expression for each comparison I have generated . How to do that instead of doing for each comparison individually.

I figured out how to use lapply for getting normalized counts

ADD REPLY • link 2.9 years ago chm • 0