I am a beginner in R and still learning how to use edgeR. I want to do a TMM normalization to my dataset, however I have the following questions:
1. The total number of genes that were expressed in the experiment is 3622, since the samples have some genes with zero read counts, and their log-fold changes cannot be calculated. I noticed that the reaDGE has a included 3617 genes, but some of the samples have 3567 expressed genes. The edgeR user's guide says that the the genes with zero read counts must be drop prior any analysis. How can I perform this task?
> files <- dir(pattern="*\\.csv$")
> group<- c(1,2,2,3,3,4,4,5,5,6,6,7,7)
> RG<- readDGE(files, group=group, labels=NULL)
>
> RG$samples
files group lib.size norm.factors
ARef ARef.csv 1 2911654 1
CO21 CO21.csv 2 11198927 1
CO224 CO224.csv 2 11294624 1
Light1 Light1.csv 3 12454641 1
Light24 Light24.csv 3 8668049 1
NaCl1 NaCl1.csv 4 6550245 1
NaCl24 NaCl24.csv 4 11475584 1
NaNO31 NaNO31.csv 5 10521157 1
NaNO324 NaNO324.csv 5 9045265 1
pH1 pH1.csv 6 11850679 1
pH24 pH24.csv 6 9275761 1
Temp1 Temp1.csv 7 11726524 1
Temp24 Temp24.csv 7 8120990 1
> keep <-rowSums(cpm(RG)>1) >=1
> RG<- RG[keep, , keep.lib.sizes=FALSE]
> dim(RG)
[1] 3617 13
2. It is understood that the TMMr(r) =1; however, using calcNormFactors selecting the first column as the reference sample, the value is 1.1318411 as seen below. It is right, or I am missing some parameter here?
RG<-calcNormFactors(RG, method=c("TMM"),
+ refColumn=1, logratioTrim=.3, sumTrim=0.05, doWeighting=TRUE,
+ Acutoff=-1e10, p=0.75)
> RG$samples
files group lib.size norm.factors
ARef ARef.csv 1 2911652 1.1318411
CO21 CO21.csv 2 11198918 1.0381266
CO224 CO224.csv 2 11294616 0.8066788
Light1 Light1.csv 3 12454627 0.7167523
Light24 Light24.csv 3 8668044 0.7112044
NaCl1 NaCl1.csv 4 6550232 0.7621304
NaCl24 NaCl24.csv 4 11475570 0.8852015
NaNO31 NaNO31.csv 5 10521142 1.3151826
NaNO324 NaNO324.csv 5 9045259 1.4467104
pH1 pH1.csv 6 11850666 1.0574913
pH24 pH24.csv 6 9275757 0.7936610
Temp1 Temp1.csv 7 11726477 1.2622590
Temp24 Temp24.csv 7 8120977 1.5219508
Any help will be appreciated
Aaron, thanks for your answers. They clear me a lot my concerns.