I want to perform missing value imputation on TMT-tags based quantitative proteomics data. I would be performing mixed imputation by applying two different methods (MCAR/MNAR) on two different groups within same dataset. Should I perform the imputation on raw or log transformed peptide intensity data?
I don't think it matters, as long as you wouldn't use a zero imputation for MNAR.
However, as you use TMT tags, one would expect your missing values to be the results of absent peptides, rather than the MS missing features, because samples were combined. If it is a typical shotgun experiment, one wouldn't expect many missing values; some features can have many missing values, and these should probably be filtered out completely.
Thanks Laurent for your reply. So in my dataset one condition is supposed to be have more missing peptide than the other due to biological reasons than the other. So filtering strategy I am employing is as below:
Condition A (5 Replicates) supposed to have more peptide than Condition B (5 Replicates)
-> Filter all the peptide completely missing from Condition A & B.
-> Keep peptide that are present in atleast 3 replicates of Condition A.
-> No such restriction on Condition B.
-> Apply MAR on Condition A and MNAR in Condition B (as here they are supposed to be biologically missing).
-> Another way is making average of peptide intensity of all replicate (for each peptide) in Condition A and assigning it to missing peptides (in other replicates) in Condition A. On the other hand giving minimum peptide intensity of all replicates (for each peptide) in Condition B and assigning it to missing peptides (in other replicates) in Condition B.
I usually remove the missing values in all my analysis but in this specific dataset due to the nature of biology I have to keep them for analysis. Hence, I would highly appreciate your feedback on the above outline as this is the first time I am using imputation in analysis.
Yes, that seems reasonable. I am unsure about using the peptide average rather than another suitable MAR method (as this will artificially minimise the variability for that peptide and the statistical tests might then be too optimistic), but I guess by trying and inspecting results, you will see.
Yes, that seems reasonable. I am unsure about using the peptide average rather than another suitable MAR method (as this will artificially minimise the variability for that peptide and the statistical tests might then be too optimistic), but I guess by trying and inspecting results, you will see.
Thanks very much for your comments.