Does the strand of a microarray probe matter?

0

Entering edit mode

Cei Abreu-Goodger ▴ 830

@cei-abreu-goodger-4433

Last seen 9.9 years ago

Mexico

Hello all, Related issues have arisen before, where the probe of a particular array platform was annotated to a gene on the opposite strand. But I was just asked if this even matters, or should it simply be considered a case of bad probe design. Does the protocol for different manufacturer's arrays always produce amplified product of both strands for the transcript to be measured? I could imagine that protocols that amplify based on poly-A tails would tend to produce an anti-sense biased amplification product (older Affy arrays?), whereas those based on random priming could produce products of both strands (and so the actual strand that is on the array becomes meaningless). Does someone know what is the case in particular for Illumina Beadarrays? Many thanks, Cei -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

probe probe • 3.1k views

ADD COMMENT • link updated 16.1 years ago by MARIA STALTERI ▴ 160 • written 16.2 years ago by Cei Abreu-Goodger ▴ 830

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 22 months ago

United States

Hi Cei, On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: > Hello all, > > Related issues have arisen before, where the probe of a particular > array platform was annotated to a gene on the opposite strand. But I > was just asked if this even matters, or should it simply be > considered a case of bad probe design. > > Does the protocol for different manufacturer's arrays always produce > amplified product of both strands for the transcript to be measured? > I could imagine that protocols that amplify based on poly-A tails > would tend to produce an anti-sense biased amplification product > (older Affy arrays?), whereas those based on random priming could > produce products of both strands (and so the actual strand that is > on the array becomes meaningless). > > Does someone know what is the case in particular for Illumina > Beadarrays? I've never worked on the bench-side of a microarray experiment, but for gene expression arrays I was under the impression that most protocols: (i) extract the the RNA from cell lysate using their poly-A tails as targets (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers. (iii) hybridize amplified cDNA to the array If that's the case, I don't think that the strand of the probe should be an issue. I'd be interested, of course, to hear other people's thoughts on this, too (while this info should be easily available from the manufacturer's site, or the Methods section of many papers, let's see if the lazy-web can help :-). -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos

ADD COMMENT • link 16.2 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

MARIA STALTERI ▴ 160

@maria-stalteri-873

Last seen 10.4 years ago

Hi Cei, The Illumina BeadArrays use a variety of assays. The whole genome gene expression arrays such as the HumanWG-6 and HumanRef-8 use a 3' based assay (IVT or in-vitro transcription assay) very similar to the one used with the older 3' Affymetrix arrays. Some of the older or custom gene expression arrays which use Illumina's universal BeadChips use the DASL assay, which uses a combination of poly(T) priming and random priming. However, if we are talking about gene expression arrays, both types of protocol start with an RNA sample, so they will only amplify transcripts that are expressed, whichever strand they are transcribed from. The problem with probes on the wrong strand is that you will not be able to detect expression from the gene in question if the probe is on the opposite strand. If a probe that is on the wrong strand gives a signal that indicates expression, then what is really being expressed is possibly an antisense transcript which may have some regulatory function. Best wishes, Maria

ADD COMMENT • link 16.2 years ago MARIA STALTERI ▴ 160

0

Entering edit mode

Charles Danko ▴ 40

@charles-danko-3146

Last seen 10.4 years ago

Hi, >Hi Cei, > >The Illumina BeadArrays use a variety of assays. > >The whole genome gene expression arrays such as the HumanWG-6 and >HumanRef-8 use a 3' based >assay (IVT or in-vitro transcription assay) very similar to the one used >with the older 3' Affymetrix arrays. > >Some of the older or custom gene expression arrays which use Illumina's >universal BeadChips use the DASL assay, which uses a combination of >poly(T) priming and random priming. > > >However, if we are talking about gene expression arrays, both types of >protocol start with an RNA sample, so they will only amplify transcripts >that are expressed, whichever strand they are transcribed from. > > >The problem with probes on the wrong strand is that you will not be able >to detect expression from the gene in question if the probe is on the >opposite strand. If a probe that is on the wrong strand gives a signal >that indicates expression, then what is really being expressed is possibly >an antisense transcript which may have some regulatory function. At least for the Affymetrix arrays, RT-PCR confirms that many of the probes anti-sense to the gene of interest target anti-sense RNA transcripts (See this article: http://www.biomedcentral.com/1471-2164/8/200). Even in the (slightly) newer human U133A/2.0 arrays, many probes are targeted to the wrong strand. Given the wide use of these Affymetrix arrays, and the wide availability of data in GEO and other repositories, this is potentially a HUGE resource waiting for someone with a little bit of extra time. > >Best wishes, >Maria

ADD COMMENT • link 16.1 years ago Charles Danko ▴ 40

0

Entering edit mode

Nick Henriquez ▴ 20

@nick-henriquez-3145

Last seen 10.4 years ago

Dear Cei, Steve, There are two versions of the correct answer depending on whether we are talking about an expression or CGH/SNP type array; If we are using an EXPRESSION array 1) It does not matter on which strand the gene resides. 2) It a not matter of bad probe design. It is either a negative control or a misnomer derived from genome annotation. For ANY probe to hybridise it has to be the RC of cDNA and therefore the DNA homologue of the original RNA sequence. (I'll let you work that one out for yourself). If the probe WAS encoded on "the opposite strand" your labelled target would not hybridise as it would be the reverse complement of the actual sequence. The annotation "opposite strand" stems from the convention that we call one strand the "coding strand" and the other strand the non-coding or "opposite" strand. By definition then a gene cannot be encoded by the "opposite" strand. However, what often happens when sequencing genomes is that we find several genes encoded on one strand (which we will then call the coding strand) and then somewhat later also one or more genes on the "opposite" strand. This annotation is (wrongly in my opinion) retained when genomes are assembled and thus part of the annotation of the probes. So an opposite strand probe is at best a kind of negative control, at worst a misnomer annotation retained when the genome was assembled. Mostly we now try to use terms like + and - but even that has the drawback that we generally associate + with coding and - with noncoding. As we all know BOTH strand encode functional RNAs of various kinds including those coding for proteins..... If we are talking about DNA targets, e.g. a SNP array 1) It does not matter on which strand a gene resides, any overlap is a matter of coincidence- "genes" are rare events on the genome. 2) It is not a matter of bad probe design. Usually it simply does not matter and this is a sequence that was used historically without knowledge of the gene (often discovered later). Sometimes the sequence on the coding strand may have a problem with background or sequence similarity. To get around this one can try to use the RC (i.e. "opposite strand" sequence) which is often different enough. Of course if more than 2 similar sequences exist the problem remains as we can use this trick only once. Hope this helps, Nick N.V. Henriquez, Senior Research Associate Dept. Of Neurodegenerative Diseases Institute of Neurology, UCL, Queen Square House rm 124 Queen Square London WC1N 3BG Message: 8 Date: Wed, 19 Nov 2008 10:45:52 -0500 From: Steve Lianoglou <mailinglist.honeypot@gmail.com> Subject: Re: [BioC] Does the strand of a microarray probe matter? To: Cei Abreu-Goodger <cei at="" ebi.ac.uk=""> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Hi Cei, On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: > Hello all, > > Related issues have arisen before, where the probe of a particular > array platform was annotated to a gene on the opposite strand. But I > was just asked if this even matters, or should it simply be > considered a case of bad probe design. > > Does the protocol for different manufacturer's arrays always produce > amplified product of both strands for the transcript to be measured? > I could imagine that protocols that amplify based on poly-A tails > would tend to produce an anti-sense biased amplification product > (older Affy arrays?), whereas those based on random priming could > produce products of both strands (and so the actual strand that is > on the array becomes meaningless). > > Does someone know what is the case in particular for Illumina > Beadarrays? I've never worked on the bench-side of a microarray experiment, but for gene expression arrays I was under the impression that most protocols: (i) extract the the RNA from cell lysate using their poly-A tails as targets (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers. (iii) hybridize amplified cDNA to the array If that's the case, I don't think that the strand of the probe should be an issue. I'd be interested, of course, to hear other people's thoughts on this, too (while this info should be easily available from the manufacturer's site, or the Methods section of many papers, let's see if the lazy-web can help :-). -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos

ADD COMMENT • link 16.1 years ago Nick Henriquez ▴ 20

0

Entering edit mode

Hi Nick, and others, Apologies for not making my question more clear, but I guess there have been some interesting answers anyway. I was in fact thinking of expression arrays. And my main interest was from the standpoint of probe annotation. It now does seem pretty clear that there are many regions in the genome that encode transcripts on both strands. If a probe is designed to such a region, the expression microarrays will be measuring both transcripts, and you will essentially have a "perfectly" cross-hybridizing probe. Now, annotation-wise, what should we do? Ignore such probes? At least flag them up? The problem is, many bioconductor annotation packages only allow a single gene to be assigned to each probe. So, in many cases you many be led to believe that your experiment has measured differential expression for a particular gene (with its set of GO terms, KEGG pathways, etc) when in fact the changing gene was the one on the other strand. These "problems" tend to show up on the list occasionally, for example when people find out that different databases (Ensembl/Biomart, NCBI, the manufacturer or a bioC annotation package) lists different genes for the same probe. Obviously not all, but many of these differences have been due to overlapping transcripts. In fact, Ensembl recently patched their probe mapping pipeline to be "strand-aware". If you think that this would affect a tiny portion of probes, think again: the Affymetrix probes affected on the human and mouse genomes was around 10%: http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.htm l Also, from talking to some of the NuID/Illumina mapping people it seems that they simply don't consider the strand of the probe. But they do calculate a "uniqueness" score to avoid probes that map to multiple genes. In the end, I would ideally prefer "cross-hybridizing" probes (of whatever sort) to be annotated in a way that they could be identified. But I have no idea of how much a nightmare that would be for the developers of the current annotation packages... Many thanks, Cei Nick Henriquez wrote: > Dear Cei, Steve, > > There are two versions of the correct answer depending on whether we are > talking about an expression or CGH/SNP type array; > > If we are using an EXPRESSION array > > 1) It does not matter on which strand the gene resides. > 2) It a not matter of bad probe design. It is either a negative control or a > misnomer derived from genome annotation. > > For ANY probe to hybridise it has to be the RC of cDNA and therefore the DNA > homologue of the original RNA sequence. (I'll let you work that one out for > yourself). > > If the probe WAS encoded on "the opposite strand" your labelled target would > not hybridise as it would be the reverse complement of the actual sequence. > > The annotation "opposite strand" stems from the convention that we call one > strand the "coding strand" and the other strand the non-coding or "opposite" > strand. By definition then a gene cannot be encoded by the "opposite" > strand. > > However, what often happens when sequencing genomes is that we find several > genes encoded on one strand (which we will then call the coding strand) and > then somewhat later also one or more genes on the "opposite" strand. This > annotation is (wrongly in my opinion) retained when genomes are assembled > and thus part of the annotation of the probes. > > So an opposite strand probe is at best a kind of negative control, at worst > a misnomer annotation retained when the genome was assembled. Mostly we now > try to use terms like + and - but even that has the drawback that we > generally associate + with coding and - with noncoding. As we all know BOTH > strand encode functional RNAs of various kinds including those coding for > proteins..... > > If we are talking about DNA targets, e.g. a SNP array > > 1) It does not matter on which strand a gene resides, any overlap is a > matter of coincidence- "genes" are rare events on the genome. > 2) It is not a matter of bad probe design. Usually it simply does not matter > and this is a sequence that was used historically without knowledge of the > gene (often discovered later). Sometimes the sequence on the coding strand > may have a problem with background or sequence similarity. To get around > this one can try to use the RC (i.e. "opposite strand" sequence) which is > often different enough. Of course if more than 2 similar sequences exist the > problem remains as we can use this trick only once. > > Hope this helps, > > Nick > > N.V. Henriquez, Senior Research Associate > Dept. Of Neurodegenerative Diseases > Institute of Neurology, UCL, > Queen Square House rm 124 > Queen Square > London WC1N 3BG > > > > > Message: 8 > Date: Wed, 19 Nov 2008 10:45:52 -0500 > From: Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> > Subject: Re: [BioC] Does the strand of a microarray probe matter? > To: Cei Abreu-Goodger <cei at="" ebi.ac.uk=""> > Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Hi Cei, > > On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: > >> Hello all, >> >> Related issues have arisen before, where the probe of a particular >> array platform was annotated to a gene on the opposite strand. But I >> was just asked if this even matters, or should it simply be >> considered a case of bad probe design. >> >> Does the protocol for different manufacturer's arrays always produce >> amplified product of both strands for the transcript to be measured? >> I could imagine that protocols that amplify based on poly-A tails >> would tend to produce an anti-sense biased amplification product >> (older Affy arrays?), whereas those based on random priming could >> produce products of both strands (and so the actual strand that is >> on the array becomes meaningless). >> >> Does someone know what is the case in particular for Illumina >> Beadarrays? > > > I've never worked on the bench-side of a microarray experiment, but > for gene expression arrays I was under the impression that most > protocols: > > (i) extract the the RNA from cell lysate using their poly-A tails as > targets > (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers. > (iii) hybridize amplified cDNA to the array > > If that's the case, I don't think that the strand of the probe should > be an issue. > > I'd be interested, of course, to hear other people's thoughts on this, > too (while this info should be easily available from the > manufacturer's site, or the Methods section of many papers, let's see > if the lazy-web can help :-). > > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > http://cbio.mskcc.org/~lianos > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

ADD REPLY • link 16.1 years ago Cei Abreu-Goodger ▴ 830

0

Entering edit mode

On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei@ebi.ac.uk> wrote: > Hi Nick, and others, > > Apologies for not making my question more clear, but I guess there have > been some interesting answers anyway. I was in fact thinking of expression > arrays. And my main interest was from the standpoint of probe annotation. > > It now does seem pretty clear that there are many regions in the genome > that encode transcripts on both strands. If a probe is designed to such a > region, the expression microarrays will be measuring both transcripts, and > you will essentially have a "perfectly" cross-hybridizing probe. > Not really. It depends on the protocol being used. For illumina, you will end up with a product that goes on the array that is strand-specific. That is not true of all array platforms. > > Now, annotation-wise, what should we do? Ignore such probes? At least flag > them up? The problem is, many bioconductor annotation packages only allow a > single gene to be assigned to each probe. So, in many cases you many be led > to believe that your experiment has measured differential expression for a > particular gene (with its set of GO terms, KEGG pathways, etc) when in fact > the changing gene was the one on the other strand. I don't think this comes up very often, but it is always possible that for any given gene there is another explanation for differential expression as observed. That is why for a given gene, it is important to validate using a different technology. Globally (as in sets of genes), it hopefully won't be too much a factor. > > > These "problems" tend to show up on the list occasionally, for example when > people find out that different databases (Ensembl/Biomart, NCBI, the > manufacturer or a bioC annotation package) lists different genes for the > same probe. Obviously not all, but many of these differences have been due > to overlapping transcripts. In fact, Ensembl recently patched their probe > mapping pipeline to be "strand-aware". If you think that this would affect a > tiny portion of probes, think again: the Affymetrix probes affected on the > human and mouse genomes was around 10%: > > http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.h tml > > Also, from talking to some of the NuID/Illumina mapping people it seems > that they simply don't consider the strand of the probe. But they do > calculate a "uniqueness" score to avoid probes that map to multiple genes. > > In the end, I would ideally prefer "cross-hybridizing" probes (of whatever > sort) to be annotated in a way that they could be identified. But I have no > idea of how much a nightmare that would be for the developers of the current > annotation packages... > There is no attempt to map probes in bioconductor annotation packages (at least those maintained by the core). The annotation from which the annotation packages are derived come directly from the manufacturers, generally. Herve Pages just posted some code to the list that will allow you to align your own probes to the genome or, more probably, to a transcript database of your choice. Then, you can use your own definitions for probes. I used to do this on a large scale for all arrays that we used, but I have backed away because the answers that one gets are very similar for the vast majority of probes. Sean > > Nick Henriquez wrote: > >> Dear Cei, Steve, >> >> There are two versions of the correct answer depending on whether we are >> talking about an expression or CGH/SNP type array; >> >> If we are using an EXPRESSION array >> >> 1) It does not matter on which strand the gene resides. >> 2) It a not matter of bad probe design. It is either a negative control or >> a >> misnomer derived from genome annotation. >> >> For ANY probe to hybridise it has to be the RC of cDNA and therefore the >> DNA >> homologue of the original RNA sequence. (I'll let you work that one out >> for >> yourself). >> >> If the probe WAS encoded on "the opposite strand" your labelled target >> would >> not hybridise as it would be the reverse complement of the actual >> sequence. >> The annotation "opposite strand" stems from the convention that we call >> one >> strand the "coding strand" and the other strand the non-coding or >> "opposite" >> strand. By definition then a gene cannot be encoded by the "opposite" >> strand. >> However, what often happens when sequencing genomes is that we find >> several >> genes encoded on one strand (which we will then call the coding strand) >> and >> then somewhat later also one or more genes on the "opposite" strand. This >> annotation is (wrongly in my opinion) retained when genomes are assembled >> and thus part of the annotation of the probes. >> >> So an opposite strand probe is at best a kind of negative control, at >> worst >> a misnomer annotation retained when the genome was assembled. Mostly we >> now >> try to use terms like + and - but even that has the drawback that we >> generally associate + with coding and - with noncoding. As we all know >> BOTH >> strand encode functional RNAs of various kinds including those coding for >> proteins..... >> >> If we are talking about DNA targets, e.g. a SNP array >> >> 1) It does not matter on which strand a gene resides, any overlap is a >> matter of coincidence- "genes" are rare events on the genome. >> 2) It is not a matter of bad probe design. Usually it simply does not >> matter >> and this is a sequence that was used historically without knowledge of the >> gene (often discovered later). Sometimes the sequence on the coding strand >> may have a problem with background or sequence similarity. To get around >> this one can try to use the RC (i.e. "opposite strand" sequence) which is >> often different enough. Of course if more than 2 similar sequences exist >> the >> problem remains as we can use this trick only once. >> >> Hope this helps, >> >> Nick >> >> N.V. Henriquez, Senior Research Associate >> Dept. Of Neurodegenerative Diseases >> Institute of Neurology, UCL, Queen Square House rm 124 >> Queen Square >> London WC1N 3BG >> >> >> >> >> Message: 8 >> Date: Wed, 19 Nov 2008 10:45:52 -0500 >> From: Steve Lianoglou <mailinglist.honeypot@gmail.com> >> Subject: Re: [BioC] Does the strand of a microarray probe matter? >> To: Cei Abreu-Goodger <cei@ebi.ac.uk> >> Cc: Bioconductor Newsgroup <bioconductor@stat.math.ethz.ch> >> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C@gmail.com> >> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes >> >> Hi Cei, >> >> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: >> >> Hello all, >>> >>> Related issues have arisen before, where the probe of a particular array >>> platform was annotated to a gene on the opposite strand. But I was just >>> asked if this even matters, or should it simply be considered a case of bad >>> probe design. >>> >>> Does the protocol for different manufacturer's arrays always produce >>> amplified product of both strands for the transcript to be measured? I >>> could imagine that protocols that amplify based on poly-A tails would tend >>> to produce an anti-sense biased amplification product (older Affy arrays?), >>> whereas those based on random priming could produce products of both >>> strands (and so the actual strand that is on the array becomes >>> meaningless). >>> >>> Does someone know what is the case in particular for Illumina >>> Beadarrays? >>> >> >> >> I've never worked on the bench-side of a microarray experiment, but for >> gene expression arrays I was under the impression that most protocols: >> >> (i) extract the the RNA from cell lysate using their poly-A tails as >> targets >> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers. >> (iii) hybridize amplified cDNA to the array >> >> If that's the case, I don't think that the strand of the probe should be >> an issue. >> >> I'd be interested, of course, to hear other people's thoughts on this, >> too (while this info should be easily available from the manufacturer's >> site, or the Methods section of many papers, let's see if the lazy-web can >> help :-). >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Physiology, Biophysics and Systems Biology >> Weill Medical College of Cornell University >> >> http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos=""> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 16.1 years ago Sean Davis 21k

0

Entering edit mode

And sorry for perhaps not making absolutely clear so to be completely certain there is no misunderstanding about this; Regardless of annotation, even if a piece of DNA encodes a gene on both strands only ONE of these will hybridise to your probe. The reverse-complement is NOT a perfect match, except in vanishingly rare cases, i.e. palindromic sequences of restriction enzymes. These are usually excluded from probe sets due to ambiguity/crosshybridising potential. RC sequences are completely different and do not crosshybridise with cDNA. Take any sequence (actgctgacag becomes ctgtcagcagt) and you will see that and why this is the case. Given that we know the sequence of the probe we can always tell from which strand the hybridising cDNA is derived. So there is no doubt whatsoever which gene was involved/altered in expression. If geneX is on the "opposite strand" geneX was NOT the gene which was altered in its expression, geneX is not detected by the probe in question. This annotation introvertibly proves that geneX is not measured by this probe. Therefore it was geneY encoded by the relevant strand of DNA. You may have to figure out what geneY is depending on quality of annotation but there are sufficient secondary databases to do that. You may even discover a "new gene". If 10% of genes may be affected, that means 10% of the genes in your dataset. Usually we're not talking about thousands so it's fairly easy to check. E.g. by looking for "encoded by" in the annotation etc. If you use affy chips their expression console provides an excel/openoffice compatible output which will allow this, even if within R/BioC some of the annotated information might be lost. As long as the "strand identity" annotation is retained you will always see from BioC output whether geneX was in fact measured or not perhaps code can be adjusted to ignore "other strand" annotations altogether, I don't write code but it seems a relatively easy command to me, whatever the correct syntax " probes with "other strand" in the description=FALSE". Best, Nick From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: 20 November 2008 22:51 To: Cei Abreu-Goodger Cc: n.henriquez@ion.ucl.ac.uk; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Does the strand of a microarray probe matter? On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei@ebi.ac.uk> wrote: Hi Nick, and others, Apologies for not making my question more clear, but I guess there have been some interesting answers anyway. I was in fact thinking of expression arrays. And my main interest was from the standpoint of probe annotation. It now does seem pretty clear that there are many regions in the genome that encode transcripts on both strands. If a probe is designed to such a region, the expression microarrays will be measuring both transcripts, and you will essentially have a "perfectly" cross- hybridizing probe. Not really. It depends on the protocol being used. For illumina, you will end up with a product that goes on the array that is strand- specific. That is not true of all array platforms. Now, annotation-wise, what should we do? Ignore such probes? At least flag them up? The problem is, many bioconductor annotation packages only allow a single gene to be assigned to each probe. So, in many cases you many be led to believe that your experiment has measured differential expression for a particular gene (with its set of GO terms, KEGG pathways, etc) when in fact the changing gene was the one on the other strand. I don't think this comes up very often, but it is always possible that for any given gene there is another explanation for differential expression as observed. That is why for a given gene, it is important to validate using a different technology. Globally (as in sets of genes), it hopefully won't be too much a factor. These "problems" tend to show up on the list occasionally, for example when people find out that different databases (Ensembl/Biomart, NCBI, the manufacturer or a bioC annotation package) lists different genes for the same probe. Obviously not all, but many of these differences have been due to overlapping transcripts. In fact, Ensembl recently patched their probe mapping pipeline to be "strand-aware". If you think that this would affect a tiny portion of probes, think again: the Affymetrix probes affected on the human and mouse genomes was around 10%: http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.htm l Also, from talking to some of the NuID/Illumina mapping people it seems that they simply don't consider the strand of the probe. But they do calculate a "uniqueness" score to avoid probes that map to multiple genes. In the end, I would ideally prefer "cross-hybridizing" probes (of whatever sort) to be annotated in a way that they could be identified. But I have no idea of how much a nightmare that would be for the developers of the current annotation packages... There is no attempt to map probes in bioconductor annotation packages (at least those maintained by the core). The annotation from which the annotation packages are derived come directly from the manufacturers, generally. Herve Pages just posted some code to the list that will allow you to align your own probes to the genome or, more probably, to a transcript database of your choice. Then, you can use your own definitions for probes. I used to do this on a large scale for all arrays that we used, but I have backed away because the answers that one gets are very similar for the vast majority of probes. Sean Nick Henriquez wrote: Dear Cei, Steve, There are two versions of the correct answer depending on whether we are talking about an expression or CGH/SNP type array; If we are using an EXPRESSION array 1) It does not matter on which strand the gene resides. 2) It a not matter of bad probe design. It is either a negative control or a misnomer derived from genome annotation. For ANY probe to hybridise it has to be the RC of cDNA and therefore the DNA homologue of the original RNA sequence. (I'll let you work that one out for yourself). If the probe WAS encoded on "the opposite strand" your labelled target would not hybridise as it would be the reverse complement of the actual sequence. The annotation "opposite strand" stems from the convention that we call one strand the "coding strand" and the other strand the non-coding or "opposite" strand. By definition then a gene cannot be encoded by the "opposite" strand. However, what often happens when sequencing genomes is that we find several genes encoded on one strand (which we will then call the coding strand) and then somewhat later also one or more genes on the "opposite" strand. This annotation is (wrongly in my opinion) retained when genomes are assembled and thus part of the annotation of the probes. So an opposite strand probe is at best a kind of negative control, at worst a misnomer annotation retained when the genome was assembled. Mostly we now try to use terms like + and - but even that has the drawback that we generally associate + with coding and - with noncoding. As we all know BOTH strand encode functional RNAs of various kinds including those coding for proteins..... If we are talking about DNA targets, e.g. a SNP array 1) It does not matter on which strand a gene resides, any overlap is a matter of coincidence- "genes" are rare events on the genome. 2) It is not a matter of bad probe design. Usually it simply does not matter and this is a sequence that was used historically without knowledge of the gene (often discovered later). Sometimes the sequence on the coding strand may have a problem with background or sequence similarity. To get around this one can try to use the RC (i.e. "opposite strand" sequence) which is often different enough. Of course if more than 2 similar sequences exist the problem remains as we can use this trick only once. Hope this helps, Nick N.V. Henriquez, Senior Research Associate Dept. Of Neurodegenerative Diseases Institute of Neurology, UCL, Queen Square House rm 124 Queen Square London WC1N 3BG Message: 8 Date: Wed, 19 Nov 2008 10:45:52 -0500 From: Steve Lianoglou <mailinglist.honeypot@gmail.com> Subject: Re: [BioC] Does the strand of a microarray probe matter? To: Cei Abreu-Goodger <cei@ebi.ac.uk> Cc: Bioconductor Newsgroup <bioconductor@stat.math.ethz.ch> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C@gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Hi Cei, On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: Hello all, Related issues have arisen before, where the probe of a particular array platform was annotated to a gene on the opposite strand. But I was just asked if this even matters, or should it simply be considered a case of bad probe design. Does the protocol for different manufacturer's arrays always produce amplified product of both strands for the transcript to be measured? I could imagine that protocols that amplify based on poly-A tails would tend to produce an anti-sense biased amplification product (older Affy arrays?), whereas those based on random priming could produce products of both strands (and so the actual strand that is on the array becomes meaningless). Does someone know what is the case in particular for Illumina Beadarrays? I've never worked on the bench-side of a microarray experiment, but for gene expression arrays I was under the impression that most protocols: (i) extract the the RNA from cell lysate using their poly-A tails as targets (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers. (iii) hybridize amplified cDNA to the array If that's the case, I don't think that the strand of the probe should be an issue. I'd be interested, of course, to hear other people's thoughts on this, too (while this info should be easily available from the manufacturer's site, or the Methods section of many papers, let's see if the lazy-web can help :-). -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos=""> _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered offi...{{dropped:16}}

ADD REPLY • link 16.1 years ago Nick Henriquez ▴ 20

0

Entering edit mode

On Nov 21, 2008, at 19:30 , Nick Henriquez wrote: > And sorry for perhaps not making absolutely clear so to be > completely certain there is no misunderstanding about this; > > > > Regardless of annotation, even if a piece of DNA encodes a gene on > both strands only ONE of these will hybridise to your probe. The > reverse-complement is NOT a perfect match, except in vanishingly > rare cases, i.e. palindromic sequences of restriction enzymes. These > are usually excluded from probe sets due to ambiguity/ > crosshybridising potential. RC sequences are completely different > and do not crosshybridise with cDNA. Take any sequence (actgctgacag > becomes ctgtcagcagt) and you will see that and why this is the case. > > > Given that we know the sequence of the probe we can always tell from > which strand the hybridising cDNA is derived. So there is no doubt > whatsoever which gene was involved/altered in expression. If geneX > is on the "opposite strand" geneX was NOT the gene which was altered > in its expression, geneX is not detected by the probe in question. > This annotation introvertibly proves that geneX is not measured by > this probe. Therefore it was geneY encoded by the relevant strand of > DNA. You may have to figure out what geneY is depending on quality > of annotation but there are sufficient secondary databases to do > that. You may even discover a "new gene". > This is only true if the assay does not loose strandedness. Let us say your assay involves making double stranded cDNA as eg. some high- throughput sequencing does. In that case you have no way of telling what strand your original material came from. Kasper > > > If 10% of genes may be affected, that means 10% of the genes in your > dataset. Usually we're not talking about thousands so it's fairly > easy to check. E.g. by looking for "encoded by" in the annotation > etc. If you use affy chips their expression console provides an > excel/openoffice compatible output which will allow this, even if > within R/BioC some of the annotated information might be lost. As > long as the "strand identity" annotation is retained you will always > see from BioC output whether geneX was in fact measured or not > perhaps code can be adjusted to ignore "other strand" annotations > altogether, I don't write code but it seems a relatively easy > command to me, whatever the correct syntax " probes with "other > strand" in the description=FALSE". > > > > Best, Nick > > > > From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of > Sean Davis > Sent: 20 November 2008 22:51 > To: Cei Abreu-Goodger > Cc: n.henriquez at ion.ucl.ac.uk; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Does the strand of a microarray probe matter? > > > > > > On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei at="" ebi.ac.uk=""> > wrote: > > Hi Nick, and others, > > Apologies for not making my question more clear, but I guess there > have been some interesting answers anyway. I was in fact thinking of > expression arrays. And my main interest was from the standpoint of > probe annotation. > > It now does seem pretty clear that there are many regions in the > genome that encode transcripts on both strands. If a probe is > designed to such a region, the expression microarrays will be > measuring both transcripts, and you will essentially have a > "perfectly" cross-hybridizing probe. > > > Not really. It depends on the protocol being used. For illumina, > you will end up with a product that goes on the array that is strand- > specific. That is not true of all array platforms. > > > > Now, annotation-wise, what should we do? Ignore such probes? At > least flag them up? The problem is, many bioconductor annotation > packages only allow a single gene to be assigned to each probe. So, > in many cases you many be led to believe that your experiment has > measured differential expression for a particular gene (with its set > of GO terms, KEGG pathways, etc) when in fact the changing gene was > the one on the other strand. > > > I don't think this comes up very often, but it is always possible > that for any given gene there is another explanation for > differential expression as observed. That is why for a given gene, > it is important to validate using a different technology. Globally > (as in sets of genes), it hopefully won't be too much a factor. > > > > > These "problems" tend to show up on the list occasionally, for > example when people find out that different databases (Ensembl/ > Biomart, NCBI, the manufacturer or a bioC annotation package) lists > different genes for the same probe. Obviously not all, but many of > these differences have been due to overlapping transcripts. In fact, > Ensembl recently patched their probe mapping pipeline to be "strand- > aware". If you think that this would affect a tiny portion of > probes, think again: the Affymetrix probes affected on the human and > mouse genomes was around 10%: > > http://osdir.com/ml/science.biology.ensembl.devel/2008-06/ > msg00052.html > > Also, from talking to some of the NuID/Illumina mapping people it > seems that they simply don't consider the strand of the probe. But > they do calculate a "uniqueness" score to avoid probes that map to > multiple genes. > > In the end, I would ideally prefer "cross-hybridizing" probes (of > whatever sort) to be annotated in a way that they could be > identified. But I have no idea of how much a nightmare that would be > for the developers of the current annotation packages... > > > There is no attempt to map probes in bioconductor annotation > packages (at least those maintained by the core). The annotation > from which the annotation packages are derived come directly from > the manufacturers, generally. Herve Pages just posted some code to > the list that will allow you to align your own probes to the genome > or, more probably, to a transcript database of your choice. Then, > you can use your own definitions for probes. I used to do this on a > large scale for all arrays that we used, but I have backed away > because the answers that one gets are very similar for the vast > majority of probes. > > Sean > > > > > Nick Henriquez wrote: > > Dear Cei, Steve, > > There are two versions of the correct answer depending on whether we > are > talking about an expression or CGH/SNP type array; > > If we are using an EXPRESSION array > > 1) It does not matter on which strand the gene resides. > 2) It a not matter of bad probe design. It is either a negative > control or a > misnomer derived from genome annotation. > > For ANY probe to hybridise it has to be the RC of cDNA and therefore > the DNA > homologue of the original RNA sequence. (I'll let you work that one > out for > yourself). > > If the probe WAS encoded on "the opposite strand" your labelled > target would > not hybridise as it would be the reverse complement of the actual > sequence. > The annotation "opposite strand" stems from the convention that we > call one > strand the "coding strand" and the other strand the non-coding or > "opposite" > strand. By definition then a gene cannot be encoded by the "opposite" > strand. > However, what often happens when sequencing genomes is that we find > several > genes encoded on one strand (which we will then call the coding > strand) and > then somewhat later also one or more genes on the "opposite" strand. > This > annotation is (wrongly in my opinion) retained when genomes are > assembled > and thus part of the annotation of the probes. > > So an opposite strand probe is at best a kind of negative control, > at worst > a misnomer annotation retained when the genome was assembled. Mostly > we now > try to use terms like + and - but even that has the drawback that we > generally associate + with coding and - with noncoding. As we all > know BOTH > strand encode functional RNAs of various kinds including those > coding for > proteins..... > > If we are talking about DNA targets, e.g. a SNP array > > 1) It does not matter on which strand a gene resides, any overlap is a > matter of coincidence- "genes" are rare events on the genome. > 2) It is not a matter of bad probe design. Usually it simply does > not matter > and this is a sequence that was used historically without knowledge > of the > gene (often discovered later). Sometimes the sequence on the coding > strand > may have a problem with background or sequence similarity. To get > around > this one can try to use the RC (i.e. "opposite strand" sequence) > which is > often different enough. Of course if more than 2 similar sequences > exist the > problem remains as we can use this trick only once. > > Hope this helps, > > Nick > > N.V. Henriquez, Senior Research Associate > Dept. Of Neurodegenerative Diseases > Institute of Neurology, UCL, Queen Square House rm 124 > Queen Square > London WC1N 3BG > > > > > Message: 8 > Date: Wed, 19 Nov 2008 10:45:52 -0500 > From: Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> > Subject: Re: [BioC] Does the strand of a microarray probe matter? > To: Cei Abreu-Goodger <cei at="" ebi.ac.uk=""> > Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch=""> > Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Hi Cei, > > On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote: > > Hello all, > > Related issues have arisen before, where the probe of a particular > array platform was annotated to a gene on the opposite strand. But > I was just asked if this even matters, or should it simply be > considered a case of bad probe design. > > Does the protocol for different manufacturer's arrays always > produce amplified product of both strands for the transcript to be > measured? I could imagine that protocols that amplify based on poly- > A tails would tend to produce an anti-sense biased amplification > product (older Affy arrays?), whereas those based on random priming > could produce products of both strands (and so the actual strand > that is on the array becomes meaningless). > > Does someone know what is the case in particular for Illumina > Beadarrays? > > > > I've never worked on the bench-side of a microarray experiment, but > for gene expression arrays I was under the impression that most > protocols: > > (i) extract the the RNA from cell lysate using their poly-A tails > as targets > (ii) reverse transcribe to cDNA and amplify the cDNA w/ random > primers. > (iii) hybridize amplified cDNA to the array > > If that's the case, I don't think that the strand of the probe > should be an issue. > > I'd be interested, of course, to hear other people's thoughts on > this, too (while this info should be easily available from the > manufacturer's site, or the Methods section of many papers, let's > see if the lazy-web can help :-). > > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos=""> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > offi...{{dropped:16}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 16.1 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

MARIA STALTERI ▴ 160

@maria-stalteri-873

Last seen 10.4 years ago

Hi Cei, I think you may still be a little confused about what the gene expression arrays are designed to measure. The probes on the Affymetrix 3' expression arrays are single-stranded oligos having the sense sequence, i.e. the same sequence as the mRNA they are designed to detect, and the IVT assay used with these arrays produces single stranded cRNAs with the antisense sequence, i.e. the reverse complement of the initial mRNA sample. The probes on the Affymetrix Exon ST and Gene ST arrays have the antisense sequence, i.e. the reverse complement of the sequence they are designed to detect, and the WT assay used with these arrays produces a single-stranded cDNA with the sense sequence, i.e. the same sequence as the initial mRNA sample. The probesets on these arrays are only designed to measure expression from one strand. They will only measure expression from both strands in the cases where Affymetrix have tiled probesets on both strands in the same region of the genome. This is the case for some probesets that were designed based on ESTS, where it wasn't clear which strand the gene was on at the time of array design, so probesets were tiled on both strands in the region the EST mapped to. As for problems with probeset annotations or discrepancies between one annotation source and another, we have also found that the number of annotation errors is probably somewhere close to 10%, and that genes that were close together or had overlapping ends tended to cause problems for the annotations. Affymetrix grades the reliability of the annotations for the 3' expression arrays as A, B, C or E for each probeset, with A being the most reliable and E being annotations based on EST clusters and generally the least reliable. We have found that although their A and B grade annotations are not always correct either, they are indeed more likely to be correct than the annotations they label as E. For the exon arrays, Affymetrix labels its probesets as unique, similar, or mixed depending on whether or to what extent the probes cross-hybridise, so that one can choose to use only those probesets labelled as unique if one wants to avoid cross-hybridising probes. (I haven't done any mappings of the probes on the exon arrays yet, so I don't know how true this is.) Best wishes, Maria

ADD COMMENT • link 16.1 years ago MARIA STALTERI ▴ 160

0

Entering edit mode

Hi Cei this paper contains a discussion of this topic. Antisense artifacts in transcriptome microarray experiments are resolved by actinomycin D, by Perocchi et al.: http://nar.oxfordjournals.org/cgi/content/full/35/19/e128 Fig.1 and Fig.2 show that you can get strand specific measurements, but that spurious second-strand synthesis by the reverse transcription step needs to be considered / avoided. Best wishes Wolfgang ---------------------------------------------------- Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber Maria Stalteri ha scritto: > Hi Cei, > > I think you may still be a little confused about what the gene expression > arrays are designed to measure. > > The probes on the Affymetrix 3' expression arrays are single- stranded > oligos having the sense sequence, i.e. the same sequence as the mRNA they are designed > to detect, and the IVT assay used with these arrays produces single > stranded cRNAs with the antisense sequence, i.e. the reverse complement > of the initial mRNA sample. > > The probes on the Affymetrix Exon ST and Gene ST arrays have the > antisense sequence, i.e. the reverse complement of the sequence they are > designed to detect, and the WT assay used with these arrays produces a > single-stranded cDNA with the sense sequence, i.e. the same sequence > as the initial mRNA sample. > > The probesets on these arrays are only designed to measure > expression from one strand. > > They will only measure expression from both strands in the cases where > Affymetrix have tiled probesets on both strands in the same region > of the genome. This is the case for some probesets that were designed > based on ESTS, where it wasn't clear which strand the gene was on at the > time of array design, so probesets were tiled on both strands in the > region the EST mapped to. > > > As for problems with probeset annotations or discrepancies between one > annotation source and another, we have also found that > the number of annotation errors is probably somewhere close to 10%, > and that genes that were close together or had overlapping ends tended > to cause problems for the annotations. > > Affymetrix grades the reliability of the annotations for the 3' expression > arrays as A, B, C or E for each probeset, with A being the most reliable > and E being annotations based on EST clusters and generally the least > reliable. We have found that although their A and B grade annotations are > not always correct either, they are indeed more likely to be correct than > the annotations they label as E. > > For the exon arrays, Affymetrix labels its probesets as unique, similar, > or mixed depending on whether or to what extent the probes > cross-hybridise, so that one can choose to use only those probesets > labelled as unique if one wants to avoid cross-hybridising probes. > (I haven't done any mappings of the probes on the exon arrays yet, so I > don't know how true this is.) > > Best wishes, > Maria > > _______________________________________________

ADD REPLY • link 16.1 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear list, Thank you all for your answers, from these and some offline conversations I had with people from the microarray facility, I can see that current microarray protocols attempt to produce strand specific samples before hybridizing (but see the ref. Wolfgang sent). In this case, whenever doing probe mapping we have to be careful to select only those probes with sequence matching on the appropriate strand (and this will depend on the platform, since some manufacturers report the probe sequence, and some the "target" sequence). As I mentioned before, this has historically not always been the case. One last point, regarding one of Sean's answers: " There is no attempt to map probes in bioconductor annotation packages (at least those maintained by the core). The annotation from which the annotation packages are derived come directly from the manufacturers, generally. " Even if no re-mapping is being done (there are many bioC packages not maintained by the core which do involve re-mapping), my main point was that bioconductor annotation structures don't allow more than one "gene" to be annotated for any particular probe. Do correct me if I'm wrong, but at least when using AnnotationDbi I found no way of having more than one gene (EntrezID) per probe. Another example: Affymetrix does annotate more than one gene (EntrezID) for their probes (~5% of probes in mouse430_2 with EntrezID have more than one). So, I guess if the bioconductor core team is using the manufacturer's annotation, then they are (in some way) removing this information? # bit of R code showing this: library(mouse4302.db) xx <- as.list(mouse4302ENTREZID) any(lapply(xx, length) > 1) #[1] FALSE And no, I'm not saying that different EntrezID's are always unrelated genes, or that multiple probes mapping to multiple genes are always due to strand problems. Thanks again, Cei

ADD REPLY • link 16.1 years ago Cei Abreu-Goodger ▴ 830

0

Entering edit mode

On Fri, Nov 28, 2008 at 4:44 AM, Cei Abreu-Goodger <cei@ebi.ac.uk> wrote: > Dear list, > > Thank you all for your answers, from these and some offline conversations I > had with people from the microarray facility, I can see that current > microarray protocols attempt to produce strand specific samples before > hybridizing (but see the ref. Wolfgang sent). > > In this case, whenever doing probe mapping we have to be careful to select > only those probes with sequence matching on the appropriate strand (and this > will depend on the platform, since some manufacturers report the probe > sequence, and some the "target" sequence). As I mentioned before, this has > historically not always been the case. > > One last point, regarding one of Sean's answers: > > " > There is no attempt to map probes in bioconductor annotation packages (at > least those maintained by the core). The annotation from which the > annotation packages are derived come directly from the manufacturers, > generally. > " > > Even if no re-mapping is being done (there are many bioC packages not > maintained by the core which do involve re-mapping), my main point was that > bioconductor annotation structures don't allow more than one "gene" to be > annotated for any particular probe. Do correct me if I'm wrong, but at least > when using AnnotationDbi I found no way of having more than one gene > (EntrezID) per probe. > > Another example: Affymetrix does annotate more than one gene (EntrezID) for > their probes (~5% of probes in mouse430_2 with EntrezID have more than one). > So, I guess if the bioconductor core team is using the manufacturer's > annotation, then they are (in some way) removing this information? > I think the annotation is done by mapping the genbank or refseq ids to entrez id. The Entrez IDs from the manufacturer are not used directly, I do not think. However, Marc Carlson is the best to comment on the details. Sean > > # bit of R code showing this: > library(mouse4302.db) > xx <- as.list(mouse4302ENTREZID) > any(lapply(xx, length) > 1) > #[1] FALSE > > And no, I'm not saying that different EntrezID's are always unrelated > genes, or that multiple probes mapping to multiple genes are always due to > strand problems. > > Thanks again, > > Cei > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 16.1 years ago Sean Davis 21k

Login before adding your answer.