Questions about gene identifiers and probesets regulation

0

Entering edit mode

Chunyan Liu ▴ 40

@chunyan-liu-2324

Last seen 10.0 years ago

Dear all, I'm doing gene expression comparisons between two groups of subjects using affymetrix single-channel hgu133plus2 microarray chips and I have two questions. 1) Relationship among manufacturer ID, EntrezID, GenBank ID and gene SYMBOL: Is there any one-to-one mapping? I noticed that the hgu133plus2 environment gives annotations through Entrez ID. Is this always the case? It seems to me that one EntrezID corresponds to multiple manufacturer IDs (probe name), but is this the case between manufacturer ID and GenBank ID? Is it true that one EntrezID maps to one gene symbol? 2) Probesets: Another question is after using limma, I get a list of up- and down-regulated probeset when comparing two groups (1,000 up and 2,000 down regulated probesets). When I translate these into unique gene symbols, I find 200 gene symbols that appear in both lists. Is this plausible? Interpretable? Thank you very much for any input. Chunyan Liu Cincinnati Children's Hospital

Microarray hgu133plus2 limma Microarray hgu133plus2 limma • 1.2k views

ADD COMMENT • link updated 17.1 years ago by James W. MacDonald 67k • written 17.1 years ago by Chunyan Liu ▴ 40

0

Entering edit mode

Ana Conesa ▴ 130

@ana-conesa-2246

Last seen 10.0 years ago

Dear List, I am trying to find a function/way to get the coordinates of given elements of an array. What I mean is the following: - Let X be a 3D array - I find the ordering of the elements in X by ord <- order(X) (this is a vector) - Now I want to get the x,y,z coordinates of each element of ord Can anyone help me? Thank you Ana

ADD COMMENT • link 17.1 years ago Ana Conesa ▴ 130

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

Hi Chunyan, Chunyan Liu wrote: > Dear all, > > I'm doing gene expression comparisons between two groups of subjects > using affymetrix single-channel hgu133plus2 microarray chips and I have > two questions. > > 1) Relationship among manufacturer ID, EntrezID, GenBank ID and gene > SYMBOL: Is there any one-to-one mapping? > > I noticed that the hgu133plus2 environment gives annotations through > Entrez ID. Is this always the case? It seems to me that one EntrezID > corresponds to multiple manufacturer IDs (probe name), but is this the > case between manufacturer ID and GenBank ID? Is it true that one > EntrezID maps to one gene symbol? I'm not sure if there is a one-to-one mapping from probeset ID to GenBank ID, but there certainly isn't a one-to-one mapping of GenBank ID to gene symbol (as GenBank IDs map things at the transcript level), so I am not sure that would help. I think there is a one-to-one mapping from Entrez Gene to symbol, but I am not 100% sure about that. > > 2) Probesets: Another question is after using limma, I get a list of > up- and down-regulated probeset when comparing two groups (1,000 up and > 2,000 down regulated probesets). When I translate these into unique gene > symbols, I find 200 gene symbols that appear in both lists. Is this > plausible? Interpretable? Ah, now that is the problem, isn't it? Another problem is the case where 10 probesets are supposed to interrogate a particular gene and one is significant, but the other nine are not. In that case is the gene differentialy expressed or not? What you have to understand is that Affy designed the probesets for this chip based on the UniGene build 133, which was the best information at the time, but which is really outdated now (we are on build 203 currently). Even when they designed the chip, there were three levels of probesets. Those with an _at suffix, which indicated that the probes all blast exclusively to the transcript in question, those with an _s_at (or _a_at, I forget what they used for the 133), that indicates that some of the probes bind to related transcripts (whatever 'related' means), and _x_at, which indicates that some probes bind to completely unrelated transcripts. So even when the chip was designed, some of the probesets were not nearly as reliable as others. If you take the probe sequences and blast them today, you can find _at probesets with probes that bind to unrelated sequences, so time has not always been kind to the probe mappings. What can you do about this problem? There are a couple of things you can do, but any 'fix' has its own problems. First, you can use the remapped cdfs that are made available by the MBNI at the University of Michigan (via BioC). These remapped cdfs discard the original probesets and only use those probes that are known to map to unique sequences in the genome (based on the current UniGene build), and then map to transcripts or genes based on Entrez Gene, GenBank, UniGene, Ensembl, etc. The upside to these cdfs is that you will have only one probeset per transcript/gene, so it will be impossible to have a gene symbol appearing in both the up and down regulated groups. In addition, the assumptions of say RMA or GCRMA (or any probe-level models in affyPLM) will again hold true; in other words, the intensity of a given probe will be due only to the level of the transcript it is supposed to measure plus the probe-specific binding. The downside of these cdfs is that the number of probes per probeset will vary from something like 3 - 150, so the standard error of your estimate will also vary widely. If you simply take the expression values for these probesets and analyze using limma, you will be ignoring this extra level of error (which you can safely ignore using the 'stock' affy cdfs, since most of those probesets have 11 probes per). Second, you can just use the 'stock' affy cdfs, and do some ad hoc method to decide which of the probesets to believe. You can simply choose to believe only the _at probesets. Or you can decide to blast (or blat, which is much faster and AFAICT nearly as accurate) each of the disagreeing probesets to see which one appears to actually measure the gene transcript in question. The upside here is you don't have the extra level of variability introduced by the MBNI cdfs, but the downside is the amount of extra work it will entail. HTH, Jim > > Thank you very much for any input. > > Chunyan Liu > Cincinnati Children's Hospital > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623

ADD COMMENT • link 17.1 years ago James W. MacDonald 67k

Login before adding your answer.