Question

limma with or without intercept gives different results

0

Entering edit mode

Sydney • 0

@e2cf211b

Last seen 22 months ago

United States

My data has two variables: treatment (untreated vs treated) and exon number (control, 1 and 9), each with 3 replicates. So 18 samples total but 6 unique groups (untreated-ctrl, treated-ctrl, untreated-1, treated-1, untreated-9, treated-9).

I started by coding everything into one design matrix, then making contrasts for all the comparisons I care about

# No intercept, all data
design <- model.matrix( ~0 + group)
contr.matrix <- makeContrasts(
  Ex1Unt = untreated_1 - untreated-ctrl, 
  Ex9Unt = untreated_9 - untreated-ctrl, 
  Ex1Treat = treated_1 - treated_ctrl, 
  Ex9Treat = treated_9 - treated_ctrl, 
  Ex1 = treated_1 - untreated_1,
  Ex9 = treated_9 - untreated_9,
  levels = colnames(design))

v <- voom(dge, design, plot=TRUE)
fit <- lmFit(v, design)
fit <- contrasts.fit(fit.voom, contrasts=contr.matrix)
fit<- eBayes(fit.voom, trend=TRUE)
summary(decideTests(fit))

       Ex1Unt Ex9Unt ...
Down     2099    968     ...
NotSig  12434  14619     ...
Up       1396    342    ...

My understanding is that since my design matrix is orthogonal, the intercept vs no-intercept approach should give me the same output. So I tried the analysis with an intercept but it gave me very different numbers.

This time I subsetted the data to just the untreated samples (untreated_ctrl, untreated_1, untreated_9)

# With intercept, just Untreated data
design_meanref <- model.matrix(~group)
v <- voom(dge, design_meanref, plot=TRUE)
fit <- lmFit(v,design_meanref)
fit <- eBayes(fit)
summary(decideTests(fit))

       (Intercept) groupingEx1 groupingEx9
Down           493        2473        1311
NotSig        1695       11336       12958
Up           13041        1420         960

My hunch was that this is different because this was just a subset, though I'm not sure why that would be the case. But I tried the first approach again, this time with the same subset.

# No intercept, just untreated data
         Ex1   Ex9
Down    2487  1333
NotSig 11319 12941
Up      1423   955

So two questions here:
1) Why were the numbers so different in my first and second analyses? Why did including all the data change the expression analysis so much even though I was looking at distinct groups in my comparisons? Which is more accurate?
2) Why did my third approach give me very close, but not the exact same result as the second?

Thank you!

limma rnaseq • 1.3k views

ADD COMMENT • link 2.1 years ago Sydney • 0

score 1 · Answer 1 · 2023-03-28

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Subsetting the data will always change the results, as it would do for any linear model or anova analysis, not just for limma. We always recommend that you analyse the whole experiment together.

It's hard to comment otherwise because you the code you show seems incomplete. The first code chunk (No intercept, all data) would not run unless you set the columns of design to be the same as the levels of group. I suspect you've omitted a line of code. The second code chunk (With intercept, just Untreated data) could not give the output shown because Ex1 and Ex2 are not levels of group. The third code chunk (No intercept, just untreated data) doesn't have any code so it isn't clear what you've done exactly.

ADD COMMENT • link 2.1 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon, thanks for the clarification regarding subsetting the data. I re-ran it with the whole data set with and without the intercept. I did rename some of the variables for simplicity/clarity in my original post and I've updated my annotation.

No intercept, all data

group <- reads.dge$samples$group
# Levels: treated_ctrl treated_ex1 treated_ex9 untreated_ctrl untreated_ex1 untreated_ex9
design.no_intercept <- model.matrix(~0+group)
colnames(design.no_intercept) <- gsub("group", "", colnames(design.no_intercept))

contrast.no_intercept <- makeContrasts(
  Ex1Unt = untreated_ex1 - untreated_ctrl, 
  Ex9Unt = untreated_ex9 - untreated_ctrl, 
  Ex1Treat = treated_ex1 - treated_ctrl, 
  Ex9Treat = treated_ex9 - treated_ctrl, 
  Ex1 = treated_ex1 - untreated_ex1,
  Ex9 = treated_ex9 - untreated_ex9,
  levels = colnames(design.no_intercept))
contrast.no_intercept

v.no_intercept <- voom(reads.dge, design.no_intercept, plot=TRUE)
fit.no_intercept <- lmFit(v.no_intercept, design.no_intercept)
fit.no_intercept <- contrasts.fit(fit.no_intercept, contrasts=contrast.no_intercept)
fit.no_intercept <- eBayes(fit.no_intercept, trend=TRUE)
summary(decideTests(fit.no_intercept))

# Output of the Mean-reference model: 
       Ex1Unt Ex9Unt Ex1Treat Ex9Treat   Ex1   Ex9
Down     2099    968     3788     4286   402     2
NotSig  12434  14619     8439     7644 14469 15923
Up       1396    342     3702     3999  1058     4

With intercept, all data

group.relevel <- relevel(group, "untreated_ctrl")
# Levels: untreated_ctrl treated_ctrl treated_ex1 treated_ex9 untreated_ex1 untreated_ex9
design.intercept <- model.matrix(~group.relevel)
colnames(design.intercept) <- gsub("group.relevel", "", colnames(design.intercept))

v.intercept <- voom(reads.dge, design.intercept, plot=TRUE)
fit.intercept <- lmFit(v.intercept,design.intercept)
fit.intercept <- eBayes(fit.intercept)
summary(decideTests(fit.intercept))

# Output of the Means model:
       (Intercept) treated_ctrl treated_ex1 treated_ex9 untreated_ex1 untreated_ex9
Down          1143         3647        1586         748          2090           963
NotSig        1865         8298       13266       14951         12454         14629
Up           12921         3984        1077         230          1385           337

The results comparable here are the first two columns of the first output, and the last two columns of the second output (comparing untreated ex1 and ex9 to untreated_ctrl as the reference). The difference is not that much, but it's not exactly the same. I read from an old thread that this could be the case with non-orthogonal data, which mine is not.

ADD REPLY • link 2.1 years ago Sydney • 0

1

Entering edit mode

You should write functions or loops/lapply constructs to ensure that the exact same chunk of code runs with different params (here that would be the design). By eyeballing I see:

# trend=TRUE/FALSE is a difference here
fit.no_intercept <- eBayes(fit.no_intercept, trend=TRUE)
fit.intercept <- eBayes(fit.intercept)

...as a difference, that is already the case in the toplevel question.

ADD REPLY • link 2.1 years ago ATpoint ★ 4.8k

0

Entering edit mode

Thank you for the extra set of eyes! I was trying a lot of different options, including trend, and I forgot to remove it when copy+pasting since I've been staring at it for so long. I'm getting the same results now.

ADD REPLY • link 2.1 years ago Sydney • 0