First, I apologize if these are dumb questions... I am new to R and Microarray data. That being said, I would appreciate any helpful feedback and/or links.
From what I've read here, it is unusual to remove flagged data-points in limma and I couldn't find any post showing how to do it within the limma package (read.maimages does have a weight function, but it's not for Agilent data). That being said, I'm doing a rotation with someone who prefers such methodology. Thus, I wrote a script that re-formatted the Agilent data with poorly performing probes removed. This means my .txt files now have different lengths, which I believe is causing an error with read.maimages:
Error in RG[[a]][, i] <- obj[, columns[[a]]] : number of items to replace is not a multiple of replacement length
But I would like to take advantage of limma's normalization processes. Is this possible I could reformat the data in another way so those probes are removed? Did I miss something, and eliminating such probe values is actually possible in limma? Is there another package, which may be able to do what I am seeking?
Again, sorry if this seems like a stupid question or if removing flagged values is considered "bad practice." But would really appreciate any feedback on overcoming this issue.
Thank you
"You simply read the microarray data into limma in the normal way, then set all the spots that are flagged to have zero weights Then all flagged spots will automatically be removed from any limma downstream analysis."
I was playing around with some code that I found in another thread ( Filtering genes based on Agilent flags ). But after reading what I could find on wt.fun() and the limma documentation... I am a bit loss on how to create my own function.
Attempted Code:
Again, I am appreciative of your help.
And the problem is what? Did this code not work for you? In what way did it not work?
PS. Please be careful with the buttons on this website, which confuse many users. I've moved your new question to be a comment, because you posted it as an independent answer instead of as a comment on my answer. In other words, don't answer your own questions.
I broke the code down in order to understand it. I am sure much of the confusion stems from me being new to R. But I don't understand (1) what values are being passed to that function, (2) how weights are being assigned, and (3) how that information is being fed-back into read.maimages.
For instance, lets say I "gIsWellAboveBG" values, with a 1, to have a weight of 1. And values of 0 to have a weight of 0. Is limma intelligent enough where it is able to parse-out that information and automatically deduce the weights?
Again, I apologize if these questions seem simple. Furthermore, I realized you posted an alternative way to obtain specific readings and that certainly seems like a promising route. But if there is some documentation you can point me to, I would appreciate that.
Thank you for your time.
In other words, the code already worked perfectly for you, but you just don't believe it for some reason.
You code is unnecessarily complicated, as simply
would give the same result. There is no need for limma to deduce anything. What is returned by the above function is the weight.
More importantly though, as I've said elsewhere, there is no need for you to be using wt.fun at all. In particular, you should not be using wt.fun with gIsWellAboveBG.
I don't find your questions simple, but you do have some misconceptions, and I think you are making microarray analysis more complicated by trying to micro-manage it. limma is a sophisticated program. It will do a good job of analysing your Agilent data, quickly and easily if you run a standard analysis. You are not expected to read the code but instead just follow the documentation.
If you are new to R, new to microarrays and (perhaps) new to statistics as well, trying to invent your own complicated analysis right from the start isn't necessarily productive. What about trying a simple vanilla analysis first? If highly unusual finesses really are needed for your data, then you can easily add them afterwards.
As you probably inferred from my first post, I understand that I am not following standard practices. And I hope this isn't interpreted as some brash disregard of thoughtful and tested practices. Long story short, I am bioinformatics grad student that is trying my best to analyze Agilent one-channel data while following the wishes of my PI. As so many guides point to limma when analyzing such data... limma has been my focused (and also the source of some frustration, as I am not quite understanding what is going underneath the hood).
What about trying a simple vanilla analysis first?
That would likely be prudent and I would love to find examples of basic Agilent analysis without depending on limma so much (or at all). I am curious about what you are thinking about when you say "vanilla analysis."
Again, I appreciate the feedback.
limma is the "vanilla analysis" == "standard practices".
It's hard to see what can be frustrating you about limma because you don't seem to have actually tried it yet. You haven't mentioned anything that you tried and didn't work, apart from reading the incorrect data files after you had hacked them, and those couldn't possibly have been read by any software.
But if you want try something else that's up to you.
Can I ask you what you specifically mean by a "flagged" spot? Which Agilent columns specifically were you planning to treat as quality flags?
I would like to conduct normalization with probe values that have a "gIsWellAboveBG" value of 1 and ignore/extract probes with a value of 0.
What you're saying doesn't make sense, at least not as you've said it. I suspect that you are conflating two concepts that are really separate: spot quality flags and probe-filtering.
Quality flags apply to individual spots. You can use limma to down-weight low quality spots but, so far, nothing you've said suggests that you have any low quality spots. You haven't mentioned any Agilent flag that corresponds to spot quality.
Probe-filtering on the other hand applies to a probe across a whole series of arrays. If you filter out a probe, then will you will filter out all n spots for that probe, where n is the number of arrays.
gIsWellAboveBG takes on values for each spot, not for each probe. For any probe, you will get a whole string of values for gIsWellAboveBG, one for each microarray. If you want to use gIsWellAboveBG for probe filtering, then you have to compute some summary measure of the whole set of n values for that probe. I've already told you how to do that, in my answer below.
I think you are correct and I probably was misinterpreting what was being asked. If I remove/keep individual probes based on gIsWellAboveBG... I am creating a bunch of data-sets that can't be compared.
Based on your comment, I suspect he wanted me to remove probes that are universally "poor" in all the arrays. I am not sure if that is a more acceptable practice but will confirm my suspicious with him.
I appreciate the feedback. Thank you.