Question

limma: blocking factors and factorial design

0

Entering edit mode

jlerga • 0

@jlerga-9341

Last seen 9.3 years ago

Dear Bioconductor list,

Dear Gordon Smyth,

I am studying the functional effect of inversions (that is, a segment of DNA that can be inverted in some individuals) on the human genome. My current goal is, in fact, to discover those genes that are differentially expressed depending on the genotype of the inversion. And for that, I have decided to use limma.

However, I have some questions regarding to how I should apply the covariates, the model design and a factorial design.

*My design:

I have two blocking factors: population (POP1, POP2, POP3) and sex (Female, Male). And the main variable: the inversion genotype, coded as the number of inverted alleles (0,1,2).

I have RNA-seq data from several individuals. The aim of my work is to study how inversion genotype is associated to the gene expression profile of a set of human individuals.

* Questions:

1) What are exactly the differences between these two models when comparing all the genotypes with topTableF...

~0+genotype+sex+population

~genotype+sex+population

...??

First steps:
dge <- DGEList(counts=counts)
A <- rowSums((dge$counts)>5)>=20
dge <- dge[A, , keep.lib.size = FALSE]
dge <- calcNormFactors(dge)

Model:
design_voom<-model.matrix(~0+genotype+sex+population) ## or   ~genotype+sex+population

Comparisons with model 1 (~0+genotype+sex+population):
v <- voom(dge,design_voom,plot=TRUE)
fit <- lmFit(v,design_voom)
contrast.matrix <- makeContrasts(genotype1-genotype0, genotype1-genotype2, genotype0-genotype2, levels=design_voom)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
res<-topTableF(fit2)
head(res)

Or comparisons with model 2 (~genotype+sex+population):

v <- voom(dge,design_voom,plot=TRUE)
fit <- lmFit(v,design_voom)
contrast.matrix <- makeContrasts(genotype0, genotype2, genotype0-genotype2, levels=design_voom)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
res<-topTableF(fit2)
head(res)

The results are similar, but they are not the same. My questions are: I want to obtain those genes that are differentially expressed depending only on the genotype, but I want to take into account the sex and the population. With the models I show above, am I making the comparisons according to my goal?? What are the differences between the two models??

2) I would like to find genes which vary between the three genotypes in any way. And therefore, I use topTableF because fit2$F.p.value combine the three pair-wise comparisons into one F -test.

However, it is highly probable that the major part of the genes we are looking for follow an additive model (the genotype 2 -BB- has a double effect over gene expression than the genotype 1 -AB- with respect to genotype 0 -AA-).

Is there another way/method/approach that I could use to find those genes that are truly differentially expressed? I mean, I can compare the genotypes 0 and 2 (AAvsBB), but the major part of the samples are heterozygous, and I would like to make use of them to have a greater DE power.

3) Finally, suppose that we want to obtain which genes respond to inversion genotype in a specific population; for instance, in population Pop2.

Is there any advantage of using a factorial design, which joins population with genotype instead of removing all the samples with the exception of our population of interest and make a simple analysis?

I mean, it is any advantage of the following approach:

pop.genotype<-paste(design[,2],design[,4],sep=".")  
design<-cbind(design,pop.genotype)

dge <- DGEList(counts=counts)
A <- rowSums((dge$counts)>5)>=20
dge <- dge[A, , keep.lib.size = FALSE]
dge <- calcNormFactors(dge)

design_voom<-model.matrix(~Sex+pop.genotype, design)
colnames(design_voom) <- c("Intercept" ,"SexMale" ,"Pop1.genotype0", "Pop1.genotype2", "Pop2.genotype1" , "Pop2.genotype0" ,"Pop2.genotype2" ,"Pop3.genotype1" ,"Pop3.genotype2")

v <- voom(dge,design_voom,plot=TRUE)
fit <- lmFit(v,design_voom)

contrast.matrix <- makeContrasts(Pop2.genotype1-Pop2.genotype0, Pop2.genotype1-Pop2.genotype2, Pop2.genotype0-Pop2.genotype2,levels=design_voom)
fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)
res3<-topTableF(fit2)
head(res3)

Over removing all samples with the exception of our population of interest and make a simple contrast:

design_voom<-model.matrix(~Sex+genotype, design)

v <- voom(dge,design_voom,plot=TRUE)
fit <- lmFit(v,design_voom)

contrast.matrix <- makeContrasts(genotype0, genotype2,
                                 genotype0-genotype2, levels=design_voom)
fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)
res4<-topTableF(fit2)
head(res4)

Many many thanks in advance!!

Jon Lerga

PhD student

IBB (Biotechnology and Biomedicine Institute)

Comparative and Functional Genomics group

Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona

limma differential gene expression design and contrast matrix covariates • 2.0k views

ADD COMMENT • link 9.3 years ago jlerga • 0