Hi there
please excuse my very basic questions, but I was not able to find appropriate answers using searchengines.
I am trying to analyze a small dataset of the RNAseq of 3 vs 3 samples to identify differentially expressed genes and do some multivariate statistics. Due to the low sample size I chose to use EdgeR, but am a bit confused. In the package description (https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) all steps are nicely explained, but the order seems odd to me: they first describe filtering for low read counts, which in my samples removes quite a bit from the respective libraries, and then describe TMM normalization to account for the RNA composition effect.
Is this really the right order to do it, or am I confusing things?
So first:
data_edgeR <- DGEList(counts=data_matrix[2:46079,3:10], group=group) #create DGEList for further analyses
data_edgeR$samples #looking at library sizes before filtering
keep <- rowSums(cpm(data_edgeR)>1) >= 3
data_edgeR_filtered <- data_edgeR[keep, , keep.lib.sizes=FALSE]
and then
data_TMM_normalized <- calcNormFactors(data_edgeR_filtered)
Is this correct, or the other way ´round?
Many thanks!