Hi all,
I am developing an analysis of protein microarray data and have found that limma performs very well in terms of identifying positive controls in the assay. Unfortunately, there are a large number of known, non-specific interactors that previous people have been identified in earlier experiments. For the pipeline I would like to down-weight, but not exclude, these proteins from the analysis. I have an idea, but I'm not sure if it's a good one:
The thought is to add a value to the intensity of a probe that depends on its average across past experiments. This would essentially maintain the variance while reducing the fold changes between groups for that probe. It kind of places a higher burden of proof on the that interaction for it to be real.
The problem is that I won't be seeing true fold changes anymore when I look at my output and that seems pretty bad. Does anyone out there have any other solutions to this sort of problem or can think of a way to moderate the t values by incorporating old data? Or is there some feature of limma/eBayes that could handle probe weights when calculating p-values?
Any thoughts would be appreciated!
Thanks, Andrew
It is unclear to me what you mean by a "non-specific interactor". Interacting with what? Non-specific to what? Are you saying that there are proteins on your array that always come up at DE even if they have no relationship to the treatment conditions?
Sorry for my delay in responding. The array is a functional protein array and interactions with the "prey" (proteins printed on the array) are detected by an antibody to the "bait" (what was put on the array). A non-specific interaction would be caused by "stickiness" of the prey. In general, we would expect this to be roughly independent of treatment conditions, which as I believe you're hinting, should prevent this from coming up as DE in the analysis. Unfortunately, I am told by the people doing the assay that it doesn't work as cleanly as this and there are cases where the sticky proteins are somewhat unstable and do not react identically across arrays, so might appear to be specific interactions when they are merely acting badly on a particular array.
I guess one way to say this might be that sticky proteins have a higher than expected variance as compared to other proteins at the same signal intensity. Not sure if this makes sense to people with more experience in this area of statistics, but it's how I'm thinking about it at the moment.
Based on this idea, another way to handle this situation might be to use all available prior arrays when calling
lmFit
by adding a "background" condition to the design matrix, which is only there to provide more information when calculating and shrinking variances. Then we would get the benefit of hindsight but still only calculate the contrasts we care about based on the current experiment's arrays.Is this reasonable?
Thanks!
Yes, you understood what I was hinting at. Adding background arrays can be useful in some circumstances but is probably unnecessary. I'll write short answer.