I am currently processing a mass spectrometry data set with the Bioconductor package “Differential Enrichment analysis of Proteomics data” (DEP).
The data set consists of 8 samples divided into two conditions, 4 animals per group.
I followed the steps in the vignette of said package and I am at the imputation step after normalization.
In order to select the imputation method, one must first define what type of missingness is occurring: random or not at random. In our case, NAs appear systematically depending on the condition, as opposed to randomly appearing across samples.
For data missing not randomly, imputations should be done using a left-censored imputation method. But my NA-data has a bimodal distribution (see link below for figure), and I am not sure if the methods suggested by the DEP vignette are suitable for my situation. From the examples in the DEP vignette, it seems that these methods rely in a uni-modal distribution.
Could someone help me to figure out what methods to apply to my data. I am between using the quantile regression-based left-censored function (“QRILC”) or random draws from a left-shifted distribution (centered on the 1% quantiale and standard deviation equal to the median of the feature standard deviations).
Here the figure for the distribution of intensities for data with missing values.
https://drive.google.com/open?id=1k2UiXld0yLd8LBvKR9F0PjHWiXOGNI7a
Thanks in advance
I completely agree with Laurent.
Thank you very much Laurent. I will go for the second method, as it is the one I can better understand how it works