Hello,
I am using Limma to analyze one-channel agilent microarray data from NCBI GEO (GSE57296) from raw data.
In the step of avereps(), I notice that one can
yave <- avereps(y0, ID = y0$genes[, "SystematicName"]) # "SystematicName"
OR
yave <- avereps(y0, ID = y0$genes[, "ProbeName"]) # "ProbeName"
But, the results are different, since one SystematicName may consist of many ProbeNames like the following one:
> G[SystematicName == "NR_003038"]
Row Col ControlType ProbeName SystematicName
1: 13 12 0 A_19_P00316659 NR_003038
2: 13 53 0 A_19_P00317984 NR_003038
3: 106 26 0 A_19_P00319019 NR_003038
4: 132 161 0 A_19_P00322944 NR_003038
5: 133 78 0 A_19_P00321546 NR_003038
6: 139 38 0 A_19_P00316419 NR_003038
7: 152 79 0 A_19_P00322702 NR_003038
8: 155 8 0 A_19_P00322754 NR_003038
9: 161 13 0 A_19_P00317178 NR_003038
10: 224 48 0 A_19_P00321511 NR_003038
11: 231 52 0 A_23_P361085 NR_003038
12: 245 37 0 A_19_P00316541 NR_003038
13: 270 4 0 A_19_P00319095 NR_003038
14: 301 13 0 A_19_P00316701 NR_003038
15: 319 28 0 A_19_P00317473 NR_003038
16: 331 7 0 A_19_P00320094 NR_003038
17: 347 162 0 A_19_P00322666 NR_003038
18: 384 73 0 A_19_P00317743 NR_003038
So, which one should be used in avereps()? ProbeName? OR SystematicName?
I noticed that Limma userguide used the "SystematicName", but http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma using "ProbeName".
I think "SystematicName" is better, bu not sure, can you give me some help to clear this?
Thanks.
Kevin
Thanks James. But what case should one take the average of all duplicate probes? If I understand right, in most situations, one would like to get the differentially expressed genes (DEG) no the probes, right? Even one get the DE probes, they should map these probes to genes, so it is more direct to use 'SystematicName' to get DEG, right. Thanks!
Well, the duplicate probes are measuring the exact same thing, and thus are true technical replicates. The set of probes that are intended to measure the transcript(s) from a particular gene are less so, and may in fact be intended to measure different splice variants.
There is very little to be gained from repeated measurements of the same thing, so you could argue that averaging the duplicate probes is a reasonable thing to do. You could also argue that the different probes (that may not be identical, and might measure different transcripts) are just measuring the amount of transcript from each gene, and if you don't care about the differences in the transcripts being measured (which may not be that different anyway), then it's reasonable to collapse those measurements to a single mean value.
Part of analyzing data involves making these sorts of decisions, and being able to explain what you did and why you did it. I can give you hypothetical arguments as to why one might want to do this or that, but in the end it's your analysis, and you will have to be responsible for what you did, and you will have to explain (to someone) what you did and why.
Or can one first use "ProbeName" to averge the duplicated probes and then use "SystematicName" to focus one the genes/transcripts? That is a two-step procedure. Is it reasonable to do this?