I previously made a post asking about how to model my data using limma (limma: Genes Associated with Continuous Predictor in Subsets of Samples). In brief, I have samples from ~130 people, approximately half male and half female. For most people, there is a sample from tissue 1 and a sample for tissue 2. One primary focus of our experiment is to find genes associated with fat distribution. Because males and female differ in regards to this fatness trait, and so too do the two tissues, I thought it appropriate to model gene expression in each of the four tissues separately (male-tissue 1, female-tissue 1, male-tissue 2, female-tissue 2). I have also corrected for some covariates: age, sequencing batch, overall fatness. I also accounted for sample pairings using duplicateCorrelation. The code is shown below.
design.AGR.fmi <- model.matrix(~0 +
sex.depot + # A factor combining sex.tissue
sex.depot:Android.Gynoid.Ratio +
machine +
age.at.Biopsy +
Fatmass.Index,
data = targets.multiomics)
corfit.agr.fmi <- duplicateCorrelation(log2.cpm.filtered.norm, design.AGR.fmi, block = targets.multiomics$Individual.ID)
counts_voom.agr.fmi <- voomWithQualityWeights(myDGEList.filtered.norm, design.AGR.fmi)
corfit.agr.fmi <- duplicateCorrelation(counts_voom.agr.fmi,
design.AGR.fmi,
block=targets.multiomics$OBB.ID,
corfit.agr.fmi$consensus)
counts_voom.agr.fmi <- voomWithQualityWeights(myDGEList.filtered.norm,
design.AGR.fmi,
block=targets.multiomics$Individual.ID,
correlation=corfit.agr.fmi$consensus)
fit.agr.fmi <- lmFit(counts_voom.agr.fmi, design.AGR.fmi)
ebFit.agr.fmi <- eBayes(fit.agr.fmi, robust = TRUE)
results.agr.fmi <- decideTests(ebFit.agr.fmi,
method="separate",
adjust.method="BH",
p.value=pval_threshold,
lfc=0)
The above has been working well. However, the age distribution of the females covered likely pre-, peri-, and post-menopausal ages. We tested FSH levels for the women and have now confidently labelled each women as either pre-, peri-, or post-menopausal. However, when I include menopausal status (as a factor) as a covariate, I receive a warning letting me know the matrix is no longer full rank. This is the case whether males are all labelled as "male" or by their individual ID code. What especially confused me is that if I include the FSH measurement as a continuous value, the matrix remains full rank. I'm not exactly sure how to deal with this as the correction needs to be done using menopausal status, which is somehow loses full rank.
Greatly appreciate some wisdom here! I am a little out of my depth with this project I feel.
Thanks in advance