Hi,
I would like to preform temporal patterning using the package Mfuzz
which assumes the given expression data are fully pre-processed including data normalization and averaging of replicates.
As @Michael Love explained in this question expression profiles between DESeq2 and Mfuzz "the normalized counts in DESeq2 are just provided for visualization, they aren't actually used in the DESeq2 model which uses all samples on the original counts scale and keeps track of size factors on the right-hand side of the equations as shown in the DESeq2 paper".
So, is performing one of DESeq2's transformations and then adjusting for a covariate, using limma's removeBatchEffect() function on the transformed values, as shown here How do I extract read counts from DESeq2 still the most effective way to extract normalized counts from a dds object for downstream use?
Thanks!
OK. This specific method wants normalized counts, and since the sample size is n < 30, in the vignette you suggest to use rlog. So if my
design = ~ replicate + day
it is in my best interest to usemat <- assay(rld)
mat2 <- removeBatchEffect(mat, rld$replicate)?
and then use
mat2
as an input for Mfuzz?If it wants normalized counts, you don't want to use log-transformed, VST or rlog counts.
I guess you should ask the Mfuzz developers how to deal with batch. I can't give too much advice because I don't know what they expect as input.
OK, thanks for the input. I tagged Mfuzz as well, but not every developer is as active as you are ;)
On the topic of downstream analysis and normalized counts, if I wanted to look at the correlation between genes, (at the expression level), is using one of the transformed counts appropriate?
as always, thank you for your time!
I would look at correlation on the log2 scale. I tend to use vst() for transformation, and if I want to remove batches, I use removeBatchEffect on the matrix in assay(vsd).
Just to update anyone who might be interested in using the Mfuzz package on normalized DESeq2 counts:
From the developers of Mfuzz: "For RNA-seq, you could input log-transformed normalised cpm and then use the standardise function. If you are looking specifically for peaks, you can spare the log transformation, since it reduces the dynamic range... TPM would also be OK, as TPM and CPM only differ by the inverse length of the transcript since Mfuzz::standardise() divides through the standard deviation for each gene, this factor is taken out again."
Hi, rbenel
Could you tell how did you import normalized objects (rld or vsd) to Mfuzz? Just
eset <- new('ExpressionSet', exprs=mat2, phenoData=phenodf)
? But if so, how did you include group information?Thanks for your time!
I am not sure about the link you posted, but I don't import directly the vsd object, after correcting for batch effects, I save the object as .RData and then when I worked with MFuzz I created an ExpressionSet like I showed below.
#The clustering is based soley on the exprs matrix and no information is used from the phenoData.
#so it is possible to just create the expression set using biobase (see below)