Hi,
Pretty much all the posts I could find cover multivariate datasets so I'm asking the following: how do I compare only one treatment to the baseline.
My data are the results of RNAseq wherein gene expression level was measured. I have 2 treatments, the baseline called APO, and one treatment called SYM (n=3 for both). I wish to compare the expression level between the two groups to find genes that are expressed to a significantly higher degree in the SYM treatment than the baseline. However, I'm having trouble designing my model.matrix to set APO as the intercept and then run the rest of the analysis.
My data looks something like this with the first column is the gene name and the following columns represent expression for all 3 baseline and all 3 treatment samples
APO1 APO2 APO3 Sym1 Sym2 Sym3
Rp.chrX.0016 22 71 64 10 64 81
Rp.chrX.0018 8 44 36 17 15 74
Rp.chrX.0021 93 221 272 89 254 459
Rp.chrX.0024 56 225 228 9 205 226
Rp.chrX.0027 92 476 499 24 522 463
My code is as follow
counts_brain<-read.csv("gene_expression_Brain.csv", row.names = 1)
dB<- DGEList(counts_brain)
dB<-calcNormFactors(dB)
cutoff <- 1
drop <- which(apply(cpm(dB), 1, max) < cutoff)
d <- dB[-drop,]
snames<-colnames(counts_brain)
Treatment <- substr(snames, 1, nchar(snames) - 1)
mm <- model.matrix(~Treatment)
y <- voom(d, mm, plot = T)
fit <- lmFit(y, mm)
head(coef(fit))
contr<-makeContrasts(Intercept - TreatmentSym, levels = colnames(coef(fit)))
contr
tmp<-contrasts.fit(fit,contr)
tmp <- eBayes(tmp)
top.table <- topTable(tmp, sort.by = "P", n = Inf)
head(top.table)
length(which(top.table$adj.P.Val < 0.05))
I obtain the following error message while running it Warning message:
In contrasts.fit(fit, contr) : row names of contrasts don't match col names of coefficients
From what I'm Understanding I'm not supposed to use contrast fit unless I'm comparing 2 treatments which aren't the intercept, but I just can't figure out how to do that comparison without using contrast.fit.
Thank you for your help
Thank you! That worked like a charm. Only issue was that only the top 10 results were shown. to get all the genes or at least a subset of desired length you have to set "n" (I assume the default is 10?). In this case I used n=Inf to test all genes
You automatically test all genes. If you want all the genes at an FDR<0.05, you would do