Hello,
I am a Bioinformatics Research Assistant tasked with re-constructing our local sequencing centers analysis pipeline of RNA-Seq Data.
I am following several tutorials for an RNA-Seq pipeline using EdgeR and Limma. I have 6 samples - 3 mock and 3 infected - and I have already had the sequencing center perform analysis and they saw that one of our infected samples seemed to be an outlier.
Now, I am re-running the analysis.
I performed normalization:
d <- calcNormFactors(d, method = "TMM")
# use cpm() function to get normalized counts - applies nomalization factors so it is TMM when in combination with above function
head(cpm(d))
And now was plotting MDS: (The group variable is where mock/infected is specified)
library(limma)
ltmm <- cpm(d, log = T)
tmm <- cpm(d)
plotMDS(ltmm, col = as.numeric(d$samples$group))
plotMDS(tmm, col = as.numeric(d$samples$group))
Now, plotting the log2 transformed TMM counts shows that the infected sample is plotted distant from the others and is also identical when I just plot "d" which is my DGElist object from above. When i plot the non-log transformed, there is no large gap between samples....... The first plot appears like that of the sequencing centers but I am unsure of which normalized counts to use for downstream DGE and why or why not I should apply log2 transformation?
Any advice or references on the correct normalization methods would be appreciated.
Thanks, Sara
The first image shows the MDS plot for using log2 transformed TMM counts and the second shows the TMM counts plotted without log2 transform.