Question

Use of limma R package with targeted proteomics for DE analysis

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 17 days ago

Germany/Heidelberg/German Cancer Resear…

Hello everyone ! I wanted to ask a particular question regarding the implementation of limma R package, for DE regarding proteomics multiplex immunoassay technologies. In particular, my experiment is related with blood samples and performed using targeted Olink assay panels. My major goal is to investigate if there are any protein expression differences based on two different profiled groups. The complete experiment was run in 3 separate plates (not completely randomized, but including individuals from both groups), yielding 3 distinct outputs, into 3 different excel files:

colnames(prot_dat)
 [1] "SampleID"                    "Index"                       "OlinkID"                    
 [4] "UniProt"                     "Assay"                       "MissingFreq"                
 [7] "Panel"                       "Panel_Version"               "PlateID"                    
[10] "QC_Warning"                  "PlateLQL"                    "PlateLOD"                  
[13] "LLOQ"                        "ULOQ"                        "Quantified_value"          
[16] "Unit"                        "Assay_Warning"               "Normalization"              
[19] "QC Deviation Inc Ctrl"       "QC Deviation Det Ctrl"       "Olink NPX Signature Version"

In addition, the column "Normalization", in all 3 separate excel files, has the value "Calibrator Normalized". After thorough investigation, these numeric values represent the absolute quantification of the panel proteins, which depend on the included calibrator samples. This process brings all samples at the same scale (pg/ml), and hence (from the general guidelines), no additional "bridging" is required to make these runs directly comparable to each other. In parallel, from a direct comparison of total protein expression distribution across the 3 plates, there was not any significant difference between the 3 plates; Hence, I proceeded with direct merging of all plates into a concatenated dataset. On this premise, my main questions is the following:

range(prot_dat$Quantified_value, na.rm = TRUE) 
[1] 0.00122 15801.02184

In the column "Quantified_value", the values are ranging from 0 to some thousands, like 2000; I also utilized a simple log2 (+0.5) transformation, as well as an additional median-center transformation. But if you check the below density plots, there is not much difference between log2 alone, vs additional median centering, thus:

density plots using raw (calibrator normalized)-log2 transformed-log2 & median centered data

For differential expression with limma, and based on my above description, even the log2 transformation could suffice? In addition, although there are limited features in total (~89), still the limma-trend pipeline might be beneficial for a basic two-group comparison, potentially including some categorical confounders?

Thank's a gazillion,

Efstathios

DEanalysis limma olinktargetedproteomics • 984 views

ADD COMMENT • link 9 weeks ago • updated 7 weeks ago svlachavas ▴ 840

score 1 · Answer 1 · 2025-02-21

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

It is usual to analyse log normalized values from Olink in limma. Just a straight log transformation should be sufficient. We find that trend=TRUE doesn't help much with Olink data.

ADD COMMENT • link 8 weeks ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon, thanks a million for your reply (just saw it, did not get any notification through email) !! Regarding the log2 transformation, from your experience, adding an "arbitrary" constant ~0.5 to account for any putatively "0" values would also suffice?

ADD REPLY • link 8 weeks ago svlachavas ▴ 840

1

Entering edit mode

I don't know what the "quantified value" is in your output so I can't advise you on it. It is not part of Olink output that I have seen. Olink generally outputs an "NPX" value, which is a normalized value already on the log-scale. According to Olink, you should be using that for any downstream analyses, including in limma.

ADD REPLY • link 7 weeks ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon, thank you for returning back ! Actually the output is not "NPX"-as indeed it most frequent output for a lot of pannels-but as mentioned, these are absolute quantification values in pg/ml (based on internal calibrator normalization, in order for the values to be comparable across the different plates); hence no log2 transformation like NPX; the range of the values is like depicted above, thus not any values that are "actual" 0; That is why I was wondering if adding any constant-depending on the range-would alter significantly especially the negative values after log2 transform, and it would be "less biased" to just do a simple log2 transformation without adding any constants..

ADD REPLY • link 7 weeks ago svlachavas ▴ 840