I have a complex RNASeq data with 3 factors, genotype time and treatment type. Top part of the target file looks like,
Sample Geno Time Treatment
Sampl1 KK 2hr Control
Sampl2 CR 2hr Control
Sampl3 KK 2hr Control
Sampl4 CR 2hr Control
Sampl5 KK 2hr Stress1
Sampl6 CR 2hr Stress1
Sampl7 KK 4hr Stress1
Sampl8 CR 4hr Stress1
Sampl9 KK 4hr Stress2
Sampl10 CR 4hr Stress2
Sampl11 KK 4hr Stress2
Sampl12 CR 4hr Stress2
....
I analyzed data with edgeR.Grouped factors and estimated DEGs using makeContrasts between groups,
group<-factor(paste(targets$Geno,targets$Time,targets$Treatment,sep="."))
cbind(targets,Group=group)
y<-DGEList(counts = raw_counts, group = group)
keep <-rowSums(cpm(y)>=2) >=2
y <- y[keep, , keep.lib.sizes=FALSE]
y<-calcNormFactors(y)
design<-model.matrix(~0+group)
colnames(design)<-levels(group)
y<-estimateDisp(y,design)
fit<-glmQLFit(y,design)
Normalized_counts<-cpm(y)
Now I added another batch of data for 6hr and I want to include the batch effect in the analysis. After reading similar posts, I think one possible way would be to add a new 'batch' column to target file and address that in the design matrix. Like,
Sample Geno Time Treatment Batch
Sampl1 KK 2hr Control B1
Sampl2 CR 2hr Control B1
.
Sampl60 CR 2hr Control B2
And include the batch in the model and proceed as shown above,
group<-factor(paste(targets$Geno,targets$Time,targets$Treatment, targets$Batch,sep="."))
design<-model.matrix(~0+group+Batch)
Is that the correct method to adopt? Please note that I will make a pairwise comparison for DE analysis all those comparisons are within a time point. ie, I don't have plan to compare for ex, 2hr control vs 6hr stress.
I understand the design formula, but I don't understand why the group variable should include targets$Batch. If someone could explain me, I will appreciate.