Question

Differential Expression Genes as Independent Variables

0

Entering edit mode

abadgerw • 0

@5088ef59

Last seen 1 day ago

United Kingdom

I have a dataset where I am interested in looking at differential expression of genes in a singular body fluid and the relationship with a histochemical outcome that is a repeated measure. I would prefer not to collapse this repeated measure into one variable and use it for input into limma due to missing data which would make a composite variable biased.

Therefore, I was wondering whether there was a statistical issue with modeling the genes as independent variables rather than a dependent variable so that my repeated measure could serve as the outcome? That way I could then run a mixed model using dream in the variance partition package or use duplicate correlation in limma?

limma DifferentialExpression variancePartition dream • 967 views

ADD COMMENT • link updated 8 weeks ago by James W. MacDonald 67k • written 9 weeks ago by abadgerw • 0

score 0 · Answer 1 · 2024-11-29

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Sorry, your question doesn't make sense to me. There are tens of thousands of genes and you can't run a mixed model analysis with one dependent variable and tens of thousands of independent variables. That obviously makes no statistical sense. Neither limma or dream are at all applicable for such an analysis.

I am not even clear what you mean by a "repeated measure". You don't seem to be using the term in the same sense as in anova theory. It cannot be true that your outcome variable is repeated but the expression values are not. If you want to relate histochemical outcome to expression, then either both must be repeated or neither.

ADD COMMENT • link 9 weeks ago Gordon Smyth 52k

0

Entering edit mode

The data that is repeated is a quantitative metric of post-mortem pathology across multiple sections. Not all patients have the same number of sections assessed due to availability, etc. The genes/proteins are measured once in the serum. The goal is to identify serum biomarkers of this pathological hallmark. Given missing values are present in the pathological data, I was concerned about generating a composite score to use as an independent variable.

Therefore, my question related to how to address this and I wondered whether the genes/proteins could not be tested one by one as an independent variable in a mixed model and p-values adjusted by FDR? Any insight into why this would not make statistical sense would be helpful for me to understand. Other suggestions/options are much appreciated. Apologies for the naivety.

ADD REPLY • link 9 weeks ago abadgerw • 0

0

Entering edit mode

Such an analysis cannot be done in limma. Sorry, I cannot tell you how to do it or even whether it is possible.

ADD REPLY • link 9 weeks ago Gordon Smyth 52k

0

Entering edit mode

Also, the reason why gene abundance is the dependent variable and is iterated over is that there are far more genes than there are samples in the majority of data sets. So, it is not possible to fit a linear model with all genes as covariates.

ADD REPLY • link 8 weeks ago Dario Strbenac ★ 1.5k

0

Entering edit mode

Thanks. The proposal was not to include all proteins as covariates in one model but to iterate with one protein serving as the independent variable in each model and then correcting the p-values from all models.

ADD REPLY • link 8 weeks ago abadgerw • 0

0

Entering edit mode

An alternative to that is to use the glmnet package to fit a regularized regression using all proteins at once.

ADD REPLY • link 8 weeks ago James W. MacDonald 67k

0

Entering edit mode

Thanks! I was thinking about this but do not believe glmnet allows for regularized mixed models unless I am mistaken?

ADD REPLY • link 8 weeks ago abadgerw • 0

0

Entering edit mode

Does not allow mixed models. And is for a different purpose to what your aims seem to be: it's purely for prediction rather than for testing hypotheses.

ADD REPLY • link 8 weeks ago Gordon Smyth 52k

0

Entering edit mode

Both true, so a poor suggestion on my part.

I am also unclear as to how one would use a single observation (the gene expression measures) as a predictor for repeated measures. Presumably the repeated measures of the histochemical outcome change over time (or why did you measure it repeatedly?), and trying to infer something about that process by regressing on a static value seems pointless. Ideally there would be repeated measures of the gene expression, and the goal would be to find changes in gene expression that vary as the histochemical outcome varies.

ADD REPLY • link 8 weeks ago James W. MacDonald 67k