Hello edgeR users,
I'm faced with a problem where I'd like to compare expression across "normal" samples.
There are eight samples - first six have two replicates each (RNA-seq done several months ago) and the seventh and eighth samples have three replicates each (RNA-seq done recently). So I'd like to get the DE genes while correcting for the batch effect.
I tried the pipeline as explained in section 4.6 of the manual to "detect genes dierentially expressed in response to hrcC challenge, while correcting for any dierences between the batches."
In my case there is no mock vs hrcC, but I guess if I can make the design matrix correctly, I can use the ANOVA-like test. Sadly, I don't think my design matrix isn't right. Could the experts please comment?
Below is my code and error messages. Any help is much appreciated.
Thanks,
Manoj
Sample names:
-------------------
ES07-1, ES07-2, ES08-1, ES08-2, NT01-1, NT01-2, NT02-1, NT02-2, NT03-1, NT03-2, NT04-1, NT04-2, ES14_s1, ES14_s2, ES14_s3, PBT1_s1, PBT1_s2, PBT1_s3
Code:
--------
targets <- readTargets("CountDataInfo_allRqd8Smpls.txt") x <- read.delim("AllRqd8Samples_RawReads.txt", row.names=1, stringsAsFactors=FALSE) #no normalization, no batch correction. y <- DGEList(counts=x[,1:18], group=targets$Treat) colnames(y) <- targets$Label dim(y) keep <- rowSums(cpm(y)>1) >= 2; y <- y[keep,] dim(y) y$samples$lib.size <- colSums(y$counts) y$samples y <- calcNormFactors(y) y$samples Treat <- factor(substring(colnames(x),1,5)) Batch <- factor(c("1","1","1","1","1","1","1","1","1","1","1","1","2","2","2","2","2","2")) design <- model.matrix(~Batch+Treat) rownames(design) <- colnames(y) design y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
Output messages:
------------------------
> design
(Intercept) Batch2 TreathESO7 TreathESO8 TreatPBNT1 TreatSCNT1 TreatSCNT2 TreatSCNT3 TreatSCNT4
ES07-1 1 0 1 0 0 0 0 0 0
ES07-2 1 0 1 0 0 0 0 0 0
ES08-1 1 0 0 1 0 0 0 0 0
ES08-2 1 0 0 1 0 0 0 0 0
NT01-1 1 0 0 0 0 1 0 0 0
NT01-2 1 0 0 0 0 1 0 0 0
NT02-1 1 0 0 0 0 0 1 0 0
NT02-2 1 0 0 0 0 0 1 0 0
NT03-1 1 0 0 0 0 0 0 1 0
NT03-2 1 0 0 0 0 0 0 1 0
NT04-1 1 0 0 0 0 0 0 0 1
NT04-2 1 0 0 0 0 0 0 0 1
ES14_s1 1 1 0 0 0 0 0 0 0
ES14_s2 1 1 0 0 0 0 0 0 0
ES14_s3 1 1 0 0 0 0 0 0 0
PBT1_s1 1 1 0 0 1 0 0 0 0
PBT1_s2 1 1 0 0 1 0 0 0 0
PBT1_s3 1 1 0 0 1 0 0 0 0
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$Batch
[1] "contr.treatment"
attr(,"contrasts")$Treat
[1] "contr.treatment"
> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
TreatSCNT4
>
>
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.16.5 edgeR_3.8.6 limma_3.22.7
loaded via a namespace (and not attached):
[1] tools_3.1.1
>