Question

DESeq2 Normalized Counts for Mfuzz

0

Entering edit mode

rbenel ▴ 40

@rbenel-13642

Last seen 2.7 years ago

Israel

Hi,

I would like to preform temporal patterning using the package Mfuzz which assumes the given expression data are fully pre-processed including data normalization and averaging of replicates.

As @Michael Love explained in this question expression profiles between DESeq2 and Mfuzz "the normalized counts in DESeq2 are just provided for visualization, they aren't actually used in the DESeq2 model which uses all samples on the original counts scale and keeps track of size factors on the right-hand side of the equations as shown in the DESeq2 paper".

So, is performing one of DESeq2's transformations and then adjusting for a covariate, using limma's removeBatchEffect() function on the transformed values, as shown here How do I extract read counts from DESeq2 still the most effective way to extract normalized counts from a dds object for downstream use?

Thanks!

deseq2 rnaseq mfuzz • 3.2k views

ADD COMMENT • link updated 6.8 years ago by Michael Love 43k • written 6.8 years ago by rbenel ▴ 40

score 0 · Answer 1 · 2018-07-23

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

This really depends on the downstream use.

Count based methods will want unnormalized counts. If a method wants normalized, variance stabilized transformations we provide these, see the transformations section of the vignette. Normalized counts are useful for visualization mostly.

ADD COMMENT • link 6.8 years ago Michael Love 43k

0

Entering edit mode

OK. This specific method wants normalized counts, and since the sample size is n < 30, in the vignette you suggest to use rlog. So if my design = ~ replicate + day it is in my best interest to use mat <- assay(rld)

mat2 <- removeBatchEffect(mat, rld$replicate)?

and then use mat2 as an input for Mfuzz?

ADD REPLY • link 6.8 years ago rbenel ▴ 40

0

Entering edit mode

If it wants normalized counts, you don't want to use log-transformed, VST or rlog counts.

I guess you should ask the Mfuzz developers how to deal with batch. I can't give too much advice because I don't know what they expect as input.

ADD REPLY • link 6.8 years ago Michael Love 43k

0

Entering edit mode

OK, thanks for the input. I tagged Mfuzz as well, but not every developer is as active as you are ;)

On the topic of downstream analysis and normalized counts, if I wanted to look at the correlation between genes, (at the expression level), is using one of the transformed counts appropriate?

as always, thank you for your time!

ADD REPLY • link 6.8 years ago rbenel ▴ 40

1

Entering edit mode

I would look at correlation on the log2 scale. I tend to use vst() for transformation, and if I want to remove batches, I use removeBatchEffect on the matrix in assay(vsd).

ADD REPLY • link 6.8 years ago Michael Love 43k

1

Entering edit mode

Just to update anyone who might be interested in using the Mfuzz package on normalized DESeq2 counts:

From the developers of Mfuzz: "For RNA-seq, you could input log-transformed normalised cpm and then use the standardise function. If you are looking specifically for peaks, you can spare the log transformation, since it reduces the dynamic range... TPM would also be OK, as TPM and CPM only differ by the inverse length of the transcript since Mfuzz::standardise() divides through the standard deviation for each gene, this factor is taken out again."

ADD REPLY • link 6.8 years ago rbenel ▴ 40

0

Entering edit mode

Hi, rbenel

Could you tell how did you import normalized objects (rld or vsd) to Mfuzz? Just eset <- new('ExpressionSet', exprs=mat2, phenoData=phenodf)? But if so, how did you include group information?

Thanks for your time!

ADD REPLY • link 6.4 years ago takashi.k1201 • 0

1

Entering edit mode

I am not sure about the link you posted, but I don't import directly the vsd object, after correcting for batch effects, I save the object as .RData and then when I worked with MFuzz I created an ExpressionSet like I showed below.

#The clustering is based soley on the exprs matrix and no information is used from the phenoData.
#so it is possible to just create the expression set using biobase (see below)

library(MFuzz)
library(Biobase)

exp.set <- ExpressionSet(assayData = as.matrix(vsd.matrix))