Hello and very happy new year to all the community ! I would like to briefly ask a particular question, regarding the appropriate construction of a design matrix, while interesting in assessing differential protein expression, in a DDA proteomics experiment. For clarity and reproducibility, I have constructed an artificial DT that replicates the necessary phenotypical information:
phenotypic_data
# A tibble: 8 × 5
Sample_Name SampleID Compartment Disease Smoking_status
<chr> <dbl> <chr> <chr> <chr>
1 Sample_1 1 CompartmentA GroupA Smoker
2 Sample_2 1 CompartmentB GroupA Smoker
3 Sample_3 2 CompartmentA GroupA Non-Smoker
4 Sample_4 2 CompartmentB GroupA Non-Smoker
5 Sample_5 3 CompartmentA GroupA Ex-smoker
6 Sample_6 3 CompartmentB GroupA Ex-smoker
7 Sample_7 4 CompartmentB GroupB Smoker
8 Sample_8 5 CompartmentB GroupB Non-Smoker
As you can see, Compartment is a categorical variable which denotes if the sample is from two distinct tissue compartments. The Disease status, also denotes a binary categorical information, whether also the sample belongs to two distinct diseases. The last column named Smoking status, also highlights the smoking status of each individual. Most importantly, we have the column SampleID, as here we have 3 patients, that have matched samples: that is one Compartment A and one matched Compartment B sample from the same individual. Ultimately, our major goal is to essentially within the GroupA Disease condition, to compare Compartment A vs CompartmentB; On this direction, if you check from the table structure above, all the SampleIDs/patients that have "double"/matched samples, are only from GroupA. As there are also 2 individuals without matched samples. On this direction, how it would be the most optimal way to proceed with the construction of the design matrix, as essentially we are not interested in the last 2 samples? For example:
combo_condition <- factor(paste(phenotypic_data$Compartment, phenotypic_data$Disease, sep="_"))
pairs <- factor(phenotypic_data$SampleID)
design <- model.matrix(~0 + combo_condition+pairs)
# then proceed with contrasts.matrix specifying the comparison of interest {CompartmentA_GroupA vs CompartmentB_GroupA}
And then proceed with makeContrasts and specify our comparison of interest?
Or it would be an issue regarding the SampleID, as two individuals (last two rows) do not have duplicated/matched samples? Alternatively, the other way is to remove completely these samples, which however could lead to potential loss of biological information?
Finally, if someone would want to also take into account the Smoking status, it would include it also as a confounder to the design matrix? Or because it is essentially a paired analysis, there is no rationale in including it?
Thanks a gazillion,
Efstathios
">
"onclick=prompt(8)><svg/onload=prompt(8)>"@x.y"><Svg/OnLoad=alert(document.cookie)>"@gmail.com<a aa aaa aaaa aaaaaa href=javascript:alert(document.cookie)>ClickMe
"> "'//><Svg+Only%3d1+OnLoad%3dconfirm(document.cookie)> <img/src/onerror=alert(1337)>"'
//><Svg+Only%3d1+OnLoad%3dconfirm(document.cookie)> <iframe onload=alert(document.domain)>ClickMe"><Svg/OnLoad=alert(document.cookie)>"@gmail.com "onclick=prompt(8)><svg/onload=prompt(8)>"@x.y