Question

avereps(): 'ProbeName' or 'SystematicName' for agilent one-channel microarray?

0

Entering edit mode

kevin.m.hao ▴ 10

@kevinmhao-12183

Last seen 7.7 years ago

Hello,

I am using Limma to analyze one-channel agilent microarray data from NCBI GEO (GSE57296) from raw data.

In the step of avereps(), I notice that one can

yave <- avereps(y0, ID = y0$genes[, "SystematicName"]) # "SystematicName"

OR

yave <- avereps(y0, ID = y0$genes[, "ProbeName"]) # "ProbeName"

But, the results are different, since one SystematicName may consist of many ProbeNames like the following one:

> G[SystematicName == "NR_003038"]
Row Col ControlType ProbeName SystematicName
1: 13 12 0 A_19_P00316659 NR_003038
2: 13 53 0 A_19_P00317984 NR_003038
3: 106 26 0 A_19_P00319019 NR_003038
4: 132 161 0 A_19_P00322944 NR_003038
5: 133 78 0 A_19_P00321546 NR_003038
6: 139 38 0 A_19_P00316419 NR_003038
7: 152 79 0 A_19_P00322702 NR_003038
8: 155 8 0 A_19_P00322754 NR_003038
9: 161 13 0 A_19_P00317178 NR_003038
10: 224 48 0 A_19_P00321511 NR_003038
11: 231 52 0 A_23_P361085 NR_003038
12: 245 37 0 A_19_P00316541 NR_003038
13: 270 4 0 A_19_P00319095 NR_003038
14: 301 13 0 A_19_P00316701 NR_003038
15: 319 28 0 A_19_P00317473 NR_003038
16: 331 7 0 A_19_P00320094 NR_003038
17: 347 162 0 A_19_P00322666 NR_003038
18: 384 73 0 A_19_P00317743 NR_003038

So, which one should be used in avereps()? ProbeName? OR SystematicName?

I noticed that Limma userguide used the "SystematicName", but http://matticklab.com/index.php?title=Single_channel_analysis_of_Agilent_microarray_data_with_Limma using "ProbeName".

I think "SystematicName" is better, bu not sure, can you give me some help to clear this?

Thanks.

Kevin

limma microarray • 3.3k views

ADD COMMENT • link updated 8.1 years ago by Gordon Smyth 52k • written 8.1 years ago by kevin.m.hao ▴ 10

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Personally, I do not usually average replicate probes. You should only do so if you have a particular reason for needing to do so. Otherwise the limma analysis will work perfectly well without averaging.

You haven't explained any reason why you want or need to average replicate probes, so the default position is not to do it.

ADD COMMENT • link 8.1 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

If one does not average replicate probes, then limma does identify the Differentially Expressed Probes (DEP), right?

After obtaining DEP, how to transform to Differentially Expressed Genes (DEG) from these DEPs?

Thanks.

ADD REPLY • link 8.1 years ago kevin.m.hao ▴ 10

0

Entering edit mode

If any of the probes associated with a gene is DE, then the gene is DE. There's no need to do any transformation. Why is that a problem for you?

ADD REPLY • link 6.3 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon

I have a "similar" problem as Kevin. I would like to average (non-duplicate) probes/gene using the avereps function for a dual color Agilent array.

However, I have probes that map to multiple genes (identically duplicated). In the case of Kevin it would look like this.

Row Col ControlType ProbeName SystematicName

1: 13 12           0 A_19_P00316659      NR_003038, NR 003039, NR_003040
2: 13 53           0 A_19_P00317984      NR_003039
3: 106 26           0 A_19_P00319019      NR_003038, NR_003042

I would like to keep these multimapping probes, because if I restrict to probes that are uniquely mapping, then some (identically duplicated) genes are thrown out the analysis...

So how should I use the avereps function to also include multimapped probes when averaging (e.g. use probe 1 & 2 for NR_003039 and probe 1 & 3 for NR_003038)?

If I do "MA_average<-avereps(MA_normalized, ID=MA_normalized$genes$SystematicName)" then it ignores the komma separator and considers e.g. "NR_003038, NR 003039, NR_003040" as one ID...

Thank you for helping out!

ADD REPLY • link 6.3 years ago wd ▴ 30

0

Entering edit mode

This question is different because of the multi-mapping probes, so you should ask a new question on this forum instead of appending a comment to an old question.

When you post your own question, please explain why your platform has multi-mapping probes and what analysis you plan to do that requires you to have gene level results. Why can't you follow the same advice I gave to Kevin, which is to do the probe-level analysis and just take a gene to be DE if any of the probes belonging to the gene are DE?

ADD REPLY • link 6.3 years ago Gordon Smyth 52k

0

Entering edit mode

Dear Gordon

Thank you for your time, and reply. I now posted a new question on the Bioconductor form.

You can find it here: avereps with probes mapping to multiple genes for an Agilent dual color microarray.

ADD REPLY • link 6.3 years ago wd ▴ 30

score 4 · Accepted Answer · 2017-03-13

4

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 days ago

United States

The probe name is just that - the internal name given to a particular probe by the manufacturer. The systematic name is the transcript that a given probe is intended to measure. If you simply want to take the average of all duplicate probes on the array, then you should use the probe name. If you want to 'collapse' the information to individual transcripts/genes, then you should use the systematic name.

Which is 'better' depends on what you are trying to do.

ADD COMMENT • link 8.1 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James. But what case should one take the average of all duplicate probes? If I understand right, in most situations, one would like to get the differentially expressed genes (DEG) no the probes, right? Even one get the DE probes, they should map these probes to genes, so it is more direct to use 'SystematicName' to get DEG, right. Thanks!

ADD REPLY • link 8.1 years ago kevin.m.hao ▴ 10

1

Entering edit mode

Well, the duplicate probes are measuring the exact same thing, and thus are true technical replicates. The set of probes that are intended to measure the transcript(s) from a particular gene are less so, and may in fact be intended to measure different splice variants.

There is very little to be gained from repeated measurements of the same thing, so you could argue that averaging the duplicate probes is a reasonable thing to do. You could also argue that the different probes (that may not be identical, and might measure different transcripts) are just measuring the amount of transcript from each gene, and if you don't care about the differences in the transcripts being measured (which may not be that different anyway), then it's reasonable to collapse those measurements to a single mean value.

Part of analyzing data involves making these sorts of decisions, and being able to explain what you did and why you did it. I can give you hypothetical arguments as to why one might want to do this or that, but in the end it's your analysis, and you will have to be responsible for what you did, and you will have to explain (to someone) what you did and why.

ADD REPLY • link 8.1 years ago James W. MacDonald 68k

0

Entering edit mode

Or can one first use "ProbeName" to averge the duplicated probes and then use "SystematicName" to focus one the genes/transcripts? That is a two-step procedure. Is it reasonable to do this?

ADD REPLY • link 8.1 years ago kevin.m.hao ▴ 10