Differential Expression Genes as Independent Variables
1
0
Entering edit mode
abadgerw • 0
@5088ef59
Last seen 38 minutes ago
United Kingdom

I have a dataset where I am interested in looking at differential expression of genes in a singular body fluid and the relationship with a histochemical outcome that is a repeated measure. I would prefer not to collapse this repeated measure into one variable and use it for input into limma due to missing data which would make a composite variable biased.

Therefore, I was wondering whether there was a statistical issue with modeling the genes as independent variables rather than a dependent variable so that my repeated measure could serve as the outcome? That way I could then run a mixed model using dream in the variance partition package or use duplicate correlation in limma?

limma DifferentialExpression variancePartition dream • 514 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

Sorry, your question doesn't make sense to me. There are tens of thousands of genes and you can't run a mixed model analysis with one dependent variable and tens of thousands of independent variables. That obviously makes no statistical sense. Neither limma or dream are at all applicable for such an analysis.

I am not even clear what you mean by a "repeated measure". You don't seem to be using the term in the same sense as in anova theory. It cannot be true that your outcome variable is repeated but the expression values are not. If you want to relate histochemical outcome to expression, then either both must be repeated or neither.

ADD COMMENT
0
Entering edit mode

The data that is repeated is a quantitative metric of post-mortem pathology across multiple sections. Not all patients have the same number of sections assessed due to availability, etc. The genes/proteins are measured once in the serum. The goal is to identify serum biomarkers of this pathological hallmark. Given missing values are present in the pathological data, I was concerned about generating a composite score to use as an independent variable.

Therefore, my question related to how to address this and I wondered whether the genes/proteins could not be tested one by one as an independent variable in a mixed model and p-values adjusted by FDR? Any insight into why this would not make statistical sense would be helpful for me to understand. Other suggestions/options are much appreciated. Apologies for the naivety.

ADD REPLY
0
Entering edit mode

Such an analysis cannot be done in limma. Sorry, I cannot tell you how to do it or even whether it is possible.

ADD REPLY
0
Entering edit mode

Also, the reason why gene abundance is the dependent variable and is iterated over is that there are far more genes than there are samples in the majority of data sets. So, it is not possible to fit a linear model with all genes as covariates.

ADD REPLY
0
Entering edit mode

Thanks. The proposal was not to include all proteins as covariates in one model but to iterate with one protein serving as the independent variable in each model and then correcting the p-values from all models.

ADD REPLY
0
Entering edit mode

An alternative to that is to use the glmnet package to fit a regularized regression using all proteins at once.

ADD REPLY

Login before adding your answer.

Traffic: 568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6