Help with limma contrast matrix
2
0
Entering edit mode
chris86 ▴ 420
@chris86-8408
Last seen 5.2 years ago
UCL, United Kingdom

Hi

I thought I had contrast matrices and limma worked out, but it appears I don't. The data frame is a series of columns (samples) with rows (genes) and I am splitting this up according on the different groups to compare which are in order.

len1 <- 7
len2 <- 12
len3 <- 5

group <- factor(c(rep(1, times = len1), rep(2, times = len2), rep(3, times = len3)))

design <- model.matrix(~0+group)
matrix <- data.matrix(data6)
colnames(design)[1] <- 's1'
colnames(design)[2] <- 's2'
colnames(design)[3] <- 's3'

contrast.matrix <- makeContrasts('s1-s2','s2-s3','s1-s3', levels = design)

fit <- lmFit(matrix,design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

top2 <- topTable(fit2,coef=1,number=Inf,sort.by="P")
sig <- subset(top2, adj.P.Val<0.05)
nrow(sig)

This gives me 400 differentially expressed genes. All I want to do is compare s1 gene profile with s2. However, when I exclude s3 (remove it from the data frame) and just run the following:

 

group <- factor(c(rep(1, times = len1), rep(2, times = len2)))
design <- model.matrix(~group) # from EdgeR
matrix <- data.matrix(data6)

#

fit <- lmFit(matrix,design)
fit <- eBayes(fit)
top2 <- topTable(fit,coef=2,number=Inf,sort.by="P")
sig <- subset(top2, adj.P.Val<0.05)

 

This gives a 1000 differentially expressed genes.

So the two methods are different, but all I thought I was doing in the first step was comparing 2 gene profiles so why the difference?

Thanks.

limma • 1.5k views
ADD COMMENT
4
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

When you fit an ANOVA model and then compute contrasts, the variance estimate is based on the sums of squares of error (SSE) from the model fit. This is basically the average of the intra-group variances for all groups in the model, and relies on the assumption that each group has relatively similar variances. In the first analysis you did, your variance estimate was based in part on the s3 group, which is obviously much more variable than s1 and s2. In the second analysis you only used s1 and s2 for the variance estimate, and by removing the noisier group you increased the number of genes found significant.
 

ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 17 hours ago
The city by the bay

The variance is estimated using all samples in the data set. So, in your first approach, you'll be using the information in s3 to estimate the variance for each gene. This means that the expression of the s3 samples (or specifically, the variability thereof) will affect the size of the p-values when testing for DE between s1 and s2, even though you're never actually comparing to s3 itself. Presumably, the samples in the s3 group are more variable, such that their removal from the analysis in the second approach results in a lower variance (or higher prior d.f.) and more DE.

Note that this isn't usually the case - typically, more samples provide more stable variance estimates and greater power to detect DE. I'd check the MDS plot and make sure there aren't any outliers or batch effects within the s3 group. If you're seeing aberrant behaviour, you might consider using the arrayWeights function to down-weight any of the odd samples. If there's nothing obviously wrong, though, I'd be inclined to analyse everything together (i.e., your first approach). For all we know, a larger true variance may be the correct state of affairs, and the second approach is overly liberal.

P.S. I'm also assuming that you've taken the s3 samples out of matrix in your second approach, otherwise nothing should work.

ADD COMMENT
0
Entering edit mode

I had actually stuck on some random data in S3 to check if I understood how the contrast matrix worked so I am not suprised it is very variable. This makes a lot more sense now I know that the variance information is shared between all the samples in the contrast matrix. Thankyou.

ADD REPLY

Login before adding your answer.

Traffic: 1172 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6