Multi-factorial design for patient-replicates from different time points using duplicate correlation LIMMA Voom
1
0
Entering edit mode
alva.james • 0
@alvajames-6967
Last seen 6.4 years ago
Germany

Hello All,

I am trying to implement LIMMA room duplicate correlation strategy for my RNA-seq samples using blocking setup. I need to consider following points into account.

1. My samples are biologically paired and dependent also not all patients has its paired sample.

for instance, this is how my design table looks like,

Patient Condition Time
PE01 PH ID
PE01 PH REL
AE01 Non ID
PE05 Non REL

 I have 82 samples with more than half of patients has its paired sample and few of them doesn't. According to Limma from user guide, I have adapted to the following workflow. But I have a question,   how limma voom fits into the workflow for a paired experimental design that uses duplicateCorrelation() to handle blocking on the same subject, how is the appropriate experimental design (taking subject blocking into account) fed to voom?

My design looks like,

combined_fac <- factor(paste( design_table$Condition))

design <- model.matrix(~0 + combined_fac)

rownames(design)<-colnames(data)

colnames(design)<-c("Non","PH")

## Then followed with estimating correlation, blocking is input with design and then the Contract for finding DE genes

y <- DGEList(data)
y <- calcNormFactors(y)
v <- voom(y, design)
corfit <- duplicateCorrelation(v, design, block =design_table$Patient

)
v <- voom(y, design, block =design_table$Patient, correlation =
corfit$consensus)
fit <- lmFit(v, design, block = design_table$Patient, correlation =
corfit$consensus)
cont_matrix <- makeContrasts( PHvsNon = PH-Non,  levels=design)

fit2 <- contrasts.fit(fit, cont_matrix)
fit2 <- eBayes(fit2)
result_double <- topTable(fit2, number = nrow(data), sort.by ="none")


I wanted to make the understanding the workflow is right with for my experiment.? or what is the appropriate way to run voom such that it takes all of the necessary design information into its account? 

Thanks

limma limma voom duplicatecorrelation • 2.5k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

Time seems to be an important factor, so perhaps you should be doing:

combined_fac <- factor(paste0(design_table$Condition, ".", design_table$Time))

Even if you're only interested in DE between conditions, you need to include time in the design to avoid inflating the variance. It is simple to modify the contrasts to test for DE between conditions in the presence of time:

con <- makeContrasts( (PH.ID + PH.REL)/2 - (Non.ID + Non.REL)/2, levels=design)

In any case, your design matrix doesn't have a Phlike column, so I don't know what that is or where it comes from. I also don't know what model$Block is in the call to duplicateCorrelation, perhaps you mean design_tab$Patient.

Otherwise, the code looks fine.

ADD COMMENT
0
Entering edit mode

Thanks Aaron for your time, yes Phlike is PH as it is in my design_table, I just corrected that in my primary question, also model$block. So yes I need the DE genes between PH and Non. Their data is my count and the column names of samples don't have "." so I don't understand why combined_fac should be defined as you define? Also regarding time, is it because statistically, the variance inflation matters in the analysis? Aslo as you see few patients doesnt have either REL or ID as pairs. In this case is it important  to take the time level into account

ADD REPLY
0
Entering edit mode

Does gene expression change between levels of Time? Presumably whoever designed the experiment thought that it does, otherwise they wouldn't have bothered collecting samples for the two time points. If there is DE between time points, you need to consider the Time factor in your design matrix. Otherwise, DE between time points will manifest as an increase in the estimated variance, which will reduce your power to detect DE between conditions.

ADD REPLY
0
Entering edit mode

Does gene expression change between levels of Time? No, we first started with that (to look into the difference between 2-time levels), but we didn't see any changes between time level but the subtype is driving the difference between samples.

ADD REPLY
0
Entering edit mode

Honestly speaking, I don't see the harm in putting Time into the model. It's only one factor, so you hardly lose any residual d.f., and it does protect you against potential variance inflation. I don't know how you initially concluded that time had no effect; but if you used the same approach as you did in your original post, you obviously won't observe any DE because the variance inflation due to Condition is likely to be substantial.

ADD REPLY
0
Entering edit mode

 I don't know how you initially concluded that time had no effect; --We initially started to find the DE genes (using Deseq2) between ID and REL of all samples. And the DE genes were very heterogeneous in their expression pattern. And it was very difficult to call it as distinguish expression pattern between ID and REL. But the between subtype is showing a very distinguish pattern compared to ID versus REL. Yes I can incorporate the Time into model. Thank you

ADD REPLY
0
Entering edit mode

Sorry, I am getting error here, at makeContrasts : Error: unexpected symbol in " makeContrasts( ( combined_facPhlike ."

 

ADD REPLY
0
Entering edit mode

Why is there a dot by itself? The dot in my post is part of the column name of design.

ADD REPLY
0
Entering edit mode

In my design there is also a dot, as  combined_facPhlike . REL in the design matrix

ADD REPLY
0
Entering edit mode

Ah, you should use paste0 rather than paste when making combined_fac.

ADD REPLY
0
Entering edit mode

Yeah thanks it works :)

ADD REPLY

Login before adding your answer.

Traffic: 850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6