Limma analysis combining twins pairs and unpaired controls
1
0
Entering edit mode
@kalletrontti-17337
Last seen 6.2 years ago

Hi all,

I have RNA-seq data from twins of which one of the pair is sick (n=5) and the other one is healthy (n=5). In addition, I have six healthy controls that are not related to the twins.

I wish to look for differential expression (a) between sick twins compared to their healthy twin pairs, and (b) between sick twins and healthy controls including the healthy twins and unrelated healthy controls.

For part (a) I take that blocking might be most suitable as most twin pairs cluster tightly in an MDS plot. The code I wrote based on limma's manual is the following, please correct if you point out something peculiar:

pairs <- factor(expDesign$Pair)               #Levels "P1".."P5" or "unpaired"
disease <- factor(expDesign$Disease)    #Levels "Sick" and "Healthy"
des <- model.matrix(~pairs+disease)      #Pair info to block

y <- DGEList(counts=data)
keep<- rowSums(cpm(y)>1) >= 6
y <- y[keep, ]
y <- calcNormFactors(y, method="TMM")
v <- voom(y, des, plot=TRUE)

topTable(fit, coef=2, sort.by = "p")

 

For aim (b) I guess limma's duplicateCorrelation() could be appropriate. However I have doubts in my skills writing the code (below) and would appreciate any advice on it. Also all comments for what would be best practise to analyse this kind of data structure are warmly welcome!

dge <- DGEList(counts=data)
keep<- rowSums(cpm(dge)>1) >= 6
dge <- dge[keep, ]
dge$samples$lib.size <- colSums(dge$counts)
dge <- calcNormFactors(dge)

design <- model.matrix(~expDesign$Disease)
v <- voom(dge, design, plot=TRUE)
corfit <- duplicateCorrelation(v, design, block=expDesign$Pair)
corfit$consensus     #Gives 0.2208816
fit <- lmFit(v, design, block=expDesign$Pair, correlation=corfit$consensus)
fit <- eBayes(fit)
topTable(fit, coef=2, sort.by = "p")

 

Thanks a lot!

 

 

limma twin analysis rnaseq • 1.1k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

Your code looks fine; just a few comments:

  • I would consider removing the unpaired samples in your first analysis. For the paired samples, the variance is driven by differences in the disease effect across twins - you don't have to worry about genotypic effects (or other effects that are common between twins) because that's absorbed by the pairs terms. In contrast, the variance in the unpaired samples will be driven by differences in genotype and other individual-specific effects. By analyzing everything together, you are unnecessarily introducing two distinct modes of variation, which violates assumptions of equal variance in the linear model. Admittedly, limma is quite robust to this, but why take the risk?
  • I assume you're missing some code between voom and topTable in your first analysis.
  • In your second analysis, try one round of iteration for voom and duplicateCorrelation (i.e., re-estimating the weights with the consensus correlation, and then re-estimating the correlation with the new weights). This should improve both estimates by ensuring that the weights account for the correlations and vice versa. See Gordon's comments at using duplicateCorrelation with limma+voom for RNA-seq data.
  • I usually like setting robust=TRUE in eBayes to protect against genes with outlier variances.
  • There's no need to recompute the library sizes if you use keep.lib.sizes=FALSE, see ?"[.DGEList".
ADD COMMENT
0
Entering edit mode

Many thanks Aaron. Indeed there's a part of code missing between voom and topTable, copy-paste ...

For the first analysis I forgot to mention that actually the unpaired samples were removed prior analysis, sorry for that.

I'll try the iteration for the second part asap, thanks for the tip!

Kalle

ADD REPLY

Login before adding your answer.

Traffic: 571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6