Hi,
I am trying to address the question which genes associate with change of the metabolite levels following the diet. I have paired samples (before and after the diet). I have previously asked about possible adjustments to the paired set (https://support.bioconductor.org/p/134040/#134068) so just few follow up questions
That is an example of my data organization.
Sample ID Timepoint Metabolite Log_Metabolite Sex Age BMI
1a 1 1 10.3 3.36 1 56.5 38.1
1b 1 2 20.1 4.33 1 57.7 27.2
2a 2 1 11.0 3.46 2 21.0 44.0
2b 2 2 28.7 4.84 2 22.2 25.8
3a 3 1 12.4 3.63 1 30.0 33.1
3b 3 2 65.8 6.04 1 31.3 30.0
4a 4 1 112.0 6.81 1 67.0 31.5
4b 4 2 100.7 6.65 1 68.0 29.7
5a 5 1 36.2 5.18 1 53.5 36.8
5b 5 2 89.1 6.48 1 54.5 32.9
6a 6 1 12.9 3.69 2 25.7 40.4
6b 6 2 29.0 4.86 2 26.7 37.6
7a 7 1 15.1 3.92 2 44.8 35.7
7b 7 2 98.2 6.62 2 45.9 23.1
8a 8 1 8.0 3.00 1 25.4 29.9
8b 8 2 11.6 3.53 1 26.6 24.8
1) I would like to ask if it is OK to adjust paired data for the RNA quality (RIN) value. My supervisor insists on including some technical factors (at least a RIN)
design <- model.matrix( ~ ID + RIN + Timepoint)
2) Gordon commented previously
"Even if you included Metabolite in a completely different non-paired analysis, the Metabolite concentration would need to be a log-scale. Taking differences of unlogged Metabolite concentrates (as you have to get the change variable) is not a meaningful thing to do."
I understand that this is not edgeR but rather statistics related question, but I'm trying to understand why that's the case. The statistician in my university told me that in lm I should use my dependent variables (like glucose and insulin levels, BMI or said Metabolite) in the "raw" from- meaning untransformed.
Thank you very much for your help!