Limma trend test with pre/post design and technical replicates
1
0
Entering edit mode
Jenny G • 0
@jenny-g-16528
Last seen 3.3 years ago
Germany

Hi all,

I have 4 dose groups, technical duplicates, and pre- and post-treatment data. I'm interested in detecting a linear trend across dose groups, adjusting for pre-treatment variability. I'm not certain how to set this up. Data is like this:

patients <- data.frame(Patient = rep(c(1,2,3,4,5,6,7,8), each=4), Treatment = factor(rep(c("Placebo", "Dose1", "Dose2", "Dose3"), each=4), levels = c("Placebo", "Dose1", "Dose2", "Dose3"), ordered=TRUE), Visit = factor(rep(c("Baseline","Baseline","Visit1","Visit1"), 8)))

Is this sufficient?

design <- model.matrix(~ Visit + Treatment, data=eSet)

corfit <- duplicateCorrelation(eSet,design,block=patients$patient)

fit <- lmFit(eSet, design,block=patients$patient,correlation=corfit$consensus, 
             robust=TRUE)

fit2 <- eBayes(fit)

topTable(fit = fit2, coef = "Treatment.L" ) 

Or does there  need to be a Treatment * Visit interaction, then look at the Treatment.L * Visit1 effect? 

 

limma-trend limma • 1.5k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

Firstly, your treatment variable is a factor, not a continuous covariate, so there's no trend to be fitted here. I don't even know how you managed to get a Treatment.L coefficient. Instead, you can test the effect of each treatment against another in a pairwise fashion, or use an ANOVA to test for any effect of treatment.

Secondly, it seems that you're mixing technical replicates (same patient and visit) with some other form of replication (same patient, different visit). This will probably distort the estimate of the consensus correlation because the variability between technical replicates is probably much lower than that between separate visits, even after blocking on Visit. I would suggest averaging your technical replicates, see avearrays.

Thirdly, your discussion of the interaction effect suggests that "baseline" is actually "untreated", which wasn't obvious from your description of the experiment. If this is true, it would seem that all patients should share the same "baseline" term as the effect of the different treatments should not yet have manifested. In summary, for every sample labelled as "baseline", you would replace its treatment with "untreated", and then only use the modified Treatment in your design matrix.

Finally, you set robust=TRUE in lmFit. Are you sure you know what you're doing here? I would suggest setting robust=TRUE in eBayes instead, see A: p-values from robust linear model in LIMMA.

ADD COMMENT
0
Entering edit mode

Hi Aaron,

Thanks for your time and your help. My apologies for not explaining the experiment well, I've tried to clarify below. I have a few points/questions, if you have the time to help me.

1. Treatment is an ordered factor, so limma reports linear, quadratic and cubic effects (same as lm() does, e.g. setting up contr.poly()). I don't want pairwise comparisons, I'm specifically looking for a linear (dose-response) trend across groups. This is important to the experiment, and was specified by the team as the result of interest. 

2. I do have technical replicates (same visit, patient), and 2 different visits for each patient. I wasn't sure how to specifiy this in the model - I'll use avearrays as you suggest, thanks.

3. All patients are untreated at baseline, correct. If I understand you correctly, then I recode the treatment group variable to 'untreated' for all of the baseline samples. I could then specify the comparisons of interest (e.g. Placebo vs dose groups) from the treatment variable, and I would not need the 'visit' variable at all. I can see this, but then I lose the trend test across the dose groups. Can I set it up as follows, using orthogonal contrasts?

contrast.matrix <- makeContrasts(-3*Treatment.Placebo, -1*Treatment.Dose1, 1 * Treatment.Dose2, 3* Treatment.Dose3, levels=design)

Best wishes,

Jen

ADD REPLY
1
Entering edit mode

1. Ah, missed that in your patients. Yes, okay. I don't deal with ordered factors much, but it seems that some care may be required in interpretation when the quadratic and cubic terms are significant. For example:

X <- factor(rep(LETTERS[1:5], each=3), ordered=TRUE)
Y <- rep(1:5, each=3) - rep((-2):2, each=3)^2
Y <- jitter(Y)
summary(lm(Y ~ X)) # X.L is significantly positive...
plot(X, Y) # but it's hard to say that there is an increasing trend.

3. I would be inclined to formulate the problem in terms of the within-patient difference from the baseline. That is, for each patient, compute the log-fold difference of their visit 1 sample from their baseline. This represents the effect of the dosage, which I presume is the feature that's really of interest. I'll assume you did step 2 above:

patients <- data.frame(
    Patient = factor(rep(c(1,2,3,4,5,6,7,8), each=2)),
    Treatment = factor(rep(c("Placebo", "Dose1", "Dose2", "Dose3"), each=2), 
        levels = c("Placebo", "Dose1", "Dose2", "Dose3"), ordered=TRUE), 
    Visit = factor(rep(c("Baseline","Visit1"), 8)))

With some coaxing, you can formulate the following design matrix:

poly.part <- model.matrix(~Treatment, patients)
poly.part <- poly.part[,-1] # remove intercept for the time being
poly.part[patients$Visit=="Baseline"] <- 0
design.final <- model.matrix(~patients$Patient + poly.part)

... where the last three coefficients represent the linear, quadratic, cubic terms on the within-patient differences. You can alternatively put Patient in duplicateCorrelation instead of in the design matrix.

ADD REPLY
0
Entering edit mode

Thanks Aaron, I think I follow. A trend across within-patient changes sounds like what I want. I've done it this way and the results make sense. I've calculated percent change from baseline and plotted the hits, and they look good (e.g. effect sizes look right).

Thank you for your help!

ADD REPLY
0
Entering edit mode

Hi Aaron,

Thanks again for your help. I'm struggling now with interpreting and explaining the FCs, with this design. I would like to present a table of results with p-values and FCs, then give a clear explanation of the analysis, and explain the results for one example (say, the top hit). I'd also like to present plots for the top hits - the preferred plots are spaghetti plots and/or boxplots of the percent-change-from-baseline. Relating all of these things to each other, correctly presented and explained, is tricky. 

Our data is on a log2 scale. We are then looking at the contrasts specified above, which is, as you say, testing for a linear effect on the within-patient differences. So for each patient we have differences between 2 log-transformed data points, then we average by group and test for trend across 3 groups. Except I don't think the model is doing *exactly* that, is it? I'm struggling to understand and explain that part. So what is the logFC (and the FC) that limma produces, in this scenario? It's not really the log of the ratio of the averages of the change-from-baseline, is it? Some of these changes are negative, so how does that come into a FC? 

I want to explain it something like this, but I don't think its quite right:

"This analysis is based on within-patient change-from-baseline values. The means of these changes are compared for linear increase or decrease across dose groups. The FC value for each gene is the average of the ratios of the changes-from-baseline, between successive groups."

Any help with understanding and explaining the FCs would be very much appreciated. Thank you!

ADD REPLY
0
Entering edit mode

Don't think in terms of fold changes. Think in terms of log-fold changes. This allows you to consider positive and negative values without any problems.

Now, from the model that I described earlier, you're getting one value per patient - the log-fold change of visit 1 over baseline. You are then fitting the ~Treatment model to these log-fold changes. So a high-level summary would be that you're looking for changes in the treatment effect with increasing dosage. (Note, these are changes in the effect, not the effects themselves; the former represents a "difference of differences".)

The real problem is with the interpretation of the logFC reported by topTable. This nominally represents the "gradient" of the linear component of the trend  with respect to increasing dosage "levels". However, I'm not sure that this corresponds directly to any value of scientific interest. Interpretation of any gradient requires you to know the x-axis units - for example, increase in expression per day, or per mg of drug - but this is not given in the model. Significant quadratic or cubic components can also interfere with interpretation of the linear component, as I mentioned earlier.

If you're having trouble with interpreting these components, perhaps you should just use an ordinary factor with levels in the right order, as suggested here.

ADD REPLY
0
Entering edit mode

There's a fair bit to consider here, I'll think it over. My concern is that I want to give some sort of measure of effect size, with a coherent description. Not so easy in this case, I think! I'll consider the ordinary factor approach too. I could compare each dose group to control, then filter out any hits where the effect sizes don't increase across doses. Thanks very much, I really appreciate your time.

ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6