Dear all,
I am looking for guidance using edgeR::voomLmFit
to analyze clinical RNA-seq datasets from matched remission & disease samples. Briefly, the study design is:
20 patients, each patient has a remission and disease time point (biospecimen). Each biospecimen is sorted into two cell populations. The (ideal) objectives in layperson terms are to:
- A) Identify transcriptomic changes between disease states and remission shared by all cell types and patients
- B) Identify transcriptomic changes between disease states and remission for individual cell types, shared by all patients
- C) Identify transcriptomic changes between disease and remission for individual patients and cell types
Code:
library(edgeR)
# Prepare metadata
set.seed(1)
metadata <- data.frame(
BIOSPECIMEN=factor(paste0('S',1:20)),
INDIVIDUAL=factor(paste0('Patient',c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4)))),
CELLTYPE=factor(paste0('Cell',c(rep(c(1,2),5)))),
DISEASESTATE=factor(c('Remission','Remission','D1','D1',
'Remission','Remission','D1','D1',
'Remission','Remission','D2','D2',
'Remission','Remission','D3','D3',
'Remission','Remission','D1','D1'),
levels=c('Remission','D1','D2','D3')),
DISEASESTATEBIN=factor(rep(c('Remission','Remission','D','D'),5)),
RIN=rnorm(20,mean=6,sd=1.5),
AGE=unlist(lapply(rnorm(5,mean=45,sd=8),rep,4))
)
# Center and scale RIN score
metadata$RINS <- scale(metadata$RIN)
# Generate random counts
counts <- sapply(1:nrow(metadata),function(i) {
rnbinom(n=1000,size=sample(seq(0.1,1,by=0.1),1),prob=0.1)
})
colnames(counts) <- metadata$BIOSPECIMEN
row.names(counts) <- paste0('gene',1:nrow(counts))
Specific Question 1: Is there a preferred paradigm for voomLmFit for paired RNA-seq between blocking on individual and including individual in the model design (Option 1 vs Option 2 below):
### Analysis A) Option 1
y1 <- DGEList(counts=counts,samples=metadata)
design1 <- model.matrix(~0 + DISEASESTATE + RINS,
data=y1$samples)
keep1 <- filterByExpr(y1,design1)
y1 <- y1[keep1,,keep.lib.sizes=FALSE]
y1 <- normLibSizes(y1)
fit1 <- voomLmFit(y1,design1,block=y1$samples$INDIVIDUAL,sample.weights=TRUE)
contrasts1 <- makeContrasts('DISEASESTATED1 - DISEASESTATERemission',
levels=design1)
fit1 <- contrasts.fit(fit1,contrasts1)
fit1 <- eBayes(fit1)
res1 <- topTable(fit1,sort.by='P',n=Inf)
### Analysis A) Option 2
y2 <- DGEList(counts=counts,samples=metadata)
design2 <- model.matrix(~0 + DISEASESTATE + INDIVIDUAL + RINS,
data=y2$samples)
keep2 <- filterByExpr(y2,design2)
y2 <- y2[keep2,,keep.lib.sizes=FALSE]
y2 <- normLibSizes(y2)
fit2 <- voomLmFit(y2,design2,sample.weights=TRUE)
contrasts2 <- makeContrasts('DISEASESTATED1 - DISEASESTATERemission',
levels=design2)
fit2 <- contrasts.fit(fit2,contrasts2)
fit2 <- eBayes(fit2)
res2 <- topTable(fit2,sort.by='P',n=Inf)
# Comparison
comp <- merge(res1,res2,by='row.names')
plot(comp$logFC.x,comp$logFC.y)
message(cor(comp$logFC.x,comp$logFC.y))
Specific Question 2: For analysis B), is the best approach to create a combined group as below, or should I instead split the counts into individual cell types first? The differences in cell types (as assessed by MDS/PCA) are much greater than differences between patients or disease status:
# Analysis B)
metadata$CELLTYPE_DISEASESTATE <- paste(metadata$CELLTYPE,metadata$DISEASESTATE,sep='_')
y3 <- DGEList(counts=counts,samples=metadata)
design3 <- model.matrix(~0 + CELLTYPE_DISEASESTATE + INDIVIDUAL + RINS,
data=y3$samples)
keep3 <- filterByExpr(y3,design3)
y3 <- y3[keep3,,keep.lib.sizes=FALSE]
y3 <- normLibSizes(y3)
fit3 <- voomLmFit(y3,design3,sample.weights=TRUE)
contrasts3 <- makeContrasts('CELLTYPE_DISEASESTATECell1_D1 - CELLTYPE_DISEASESTATECell1_Remission',
levels=design3)
fit3 <- contrasts.fit(fit3,contrasts3)
fit3 <- eBayes(fit3)
res3 <- topTable(fit3,sort.by='P',n=Inf)
General questions:
- Is it appropriate to account for differences in RNA integrity (RIN) score using the centered/scaled RIN values in the model design as shown above?
- It seems reasonable to default to
sample.weights=TRUE
for clinical samples. Are there downsides to this approach?
This is my first post. Please forgive me for any issues with this post, and thank you in advance for any help.