Dear Bioconductor community
I have a question regarding the usage of age as a covariate. As proposed multiple times I tried to categorize the age covariate in order to account for it. However, as I have a rather small sample size (3 groups, n=8,n=5,n=6) it turns out that it is pretty hard to find the right way/step to cut the ages. As I tried initally 4 categories and it ended up being very unbalanced between the experimental conditions, I tried cutting with 3 breaks. You can find the resulting frequencies below:
3 breaks:
4 breaks:
As you can see there is always a pretty severe imbalance between the age categories and the experimental conditions.
So know I really do not know what to do. There are multiple options: Use age as categorical covariate (I still don't know how many breaks would be reasonable), use age as a continuous covariate (this is not suggested), don't account for age (might be ok, since we are investigating a late-onset disease and all individuals are over the critical age), or don't account for age and use SVA (not sure about that one, if I do that I get a significant surrogate variable that correlates with age with a coefficient of -0.45...).
Below you can find the distribution of ages (or birth years respectively between the experimental conditions (y axis)
I would really appreciate your help.
Thanks a lot
-Matt
Thank you very much for your valuable answer. I'll fix the age/Year of Birth thing.
However, my limited statistical knowledge doesn't allow me to understand your remark regarding the multiplicative effect of a continuous age covariate. Would it be possible to quickly elaborate on this? Thank you.
Check the vignette section on the statistical model of DESeq2 (or it's also in the first section of the Results of the DESeq2 paper).
If you have a column of x that gives the age, and then a coefficient beta that you multiply with the age column (as well as the others columns of x and their respective betas) this gives you the log2 of expression. This implies that you have multiplicative increases in expression with increases in age.
Ok, got it. Thanks a lot for your time!