Hello Bioconductor people,
I am analyzing a microarray gene expression dataset generated with the Illumina human ht-12 v4 platform, which contains several technical and biological replicates. I first load the raw image data to Genome Studio and calculate the respective group and sample matrices. I am interested in the probe-level measurements. I extract the raw data and the use Bioconductor's lumi and limma packages for pre-processing and differential gene analysis respectively.
My question is whether i) I shall average the technical replicates in Genome Studio and use Group_Probe_Profile as input to lumi and proceed to log-transformation and normalization with this table OR ii) load the Sample_Probe_Profile in lumi, proceed to log-transformation and normalization and average the technical replicates using aveArrays from limma?
In brief: shall I average the technical replicates in Genome Studio or in R? What do you usually do?
Thank you very much!
Eleni
Dear Gordon,
Thank you very much for your prompt and to the point response. I have not tried using neqc() for preprocessing...What are the advantages of using it? I will re-think whether I need to average at all, thank you so much for the comment!
neqc() is fast and easy and makes good use of control probes. It gives excellent noise control like vst does, but doesn't attenuate the signal so much. See:
http://nar.oxfordjournals.org/content/38/22/e204
for a comparison of the different Illumina preprocessing methods.
Thank you Gordon for the information. It seems interesting...I have been following the procedure that lumi user guide (from bioconductor) suggests. Neqc() is not stated in there...To make things clear in my mind, do you suggest
1) Load the Sample_Probe_Profile that Genome Studio exports
2) Perform neqc() instead of log2 or vst, and
3) Proceed to normalization (i.e. quantile)?
Thank you very much,
Eleni
Just follow the case study in Section 17.3 of the limma User's Guide. neqc() already does normalization, so there is no need for step 3. Ideally you will have the control_probe_profile file as well, but neqc() can work without it.
Great, thank you very much! I don't have the control_probe_profile file but,as you state, I can do without it.
Best wishes,
Eleni
Since you are running Genome Studio yourself, you must be able to export the control probe profiles.
Nevertheless, if you can't figure out how to do that, neqc() will infer what the control profiles must have been from the detection p-values.
Thank you very much!
Hmm...I realized I need to pull together this dataset with an older one. So I think I will need to first pull and then normalize, right? So, just to make sure, I extract the expressed y matrix from each dataset, extract the $E component and then bind the two tables together...then apply neqc() on the combined table. Is this correct? I am sorry if this is straightforward, I just wanted to be sure.
Thank you very much,
Eleni
No, you can't simply extract an $E component from a dataset and then use neqc(), because the $E component doesn't contain any information about control probes.
Ideally you would read in all the combined data again, both new and old data, and preprocess it all together from scratch.
Thank you very much Gordon! Good I asked!
Dear Gordon,
I am sorry to come back to this again but something is not clear in my mind. After applying neqc() and keep only the truly expressed probes, we have the normalized and expressed data in the $E component, right? Can I use this $E component only for creation of the design matrix and fit a model?
Thank you very much