Hello all,
Related issues have arisen before, where the probe of a particular
array
platform was annotated to a gene on the opposite strand. But I was
just
asked if this even matters, or should it simply be considered a case
of
bad probe design.
Does the protocol for different manufacturer's arrays always produce
amplified product of both strands for the transcript to be measured? I
could imagine that protocols that amplify based on poly-A tails would
tend to produce an anti-sense biased amplification product (older Affy
arrays?), whereas those based on random priming could produce products
of both strands (and so the actual strand that is on the array becomes
meaningless).
Does someone know what is the case in particular for Illumina
Beadarrays?
Many thanks,
Cei
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Hi Cei,
On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
> Hello all,
>
> Related issues have arisen before, where the probe of a particular
> array platform was annotated to a gene on the opposite strand. But I
> was just asked if this even matters, or should it simply be
> considered a case of bad probe design.
>
> Does the protocol for different manufacturer's arrays always produce
> amplified product of both strands for the transcript to be measured?
> I could imagine that protocols that amplify based on poly-A tails
> would tend to produce an anti-sense biased amplification product
> (older Affy arrays?), whereas those based on random priming could
> produce products of both strands (and so the actual strand that is
> on the array becomes meaningless).
>
> Does someone know what is the case in particular for Illumina
> Beadarrays?
I've never worked on the bench-side of a microarray experiment, but
for gene expression arrays I was under the impression that most
protocols:
(i) extract the the RNA from cell lysate using their poly-A tails as
targets
(ii) reverse transcribe to cDNA and amplify the cDNA w/ random
primers.
(iii) hybridize amplified cDNA to the array
If that's the case, I don't think that the strand of the probe should
be an issue.
I'd be interested, of course, to hear other people's thoughts on this,
too (while this info should be easily available from the
manufacturer's site, or the Methods section of many papers, let's see
if the lazy-web can help :-).
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos
Hi Cei,
The Illumina BeadArrays use a variety of assays.
The whole genome gene expression arrays such as the HumanWG-6 and
HumanRef-8 use a 3' based
assay (IVT or in-vitro transcription assay) very similar to the one
used
with the older 3' Affymetrix arrays.
Some of the older or custom gene expression arrays which use
Illumina's
universal BeadChips use the DASL assay, which uses a combination of
poly(T) priming and random priming.
However, if we are talking about gene expression arrays, both types of
protocol start with an RNA sample, so they will only amplify
transcripts
that are expressed, whichever strand they are transcribed from.
The problem with probes on the wrong strand is that you will not be
able
to detect expression from the gene in question if the probe is on the
opposite strand. If a probe that is on the wrong strand gives a signal
that indicates expression, then what is really being expressed is
possibly
an antisense transcript which may have some regulatory function.
Best wishes,
Maria
Hi,
>Hi Cei,
>
>The Illumina BeadArrays use a variety of assays.
>
>The whole genome gene expression arrays such as the HumanWG-6 and
>HumanRef-8 use a 3' based
>assay (IVT or in-vitro transcription assay) very similar to the one
used
>with the older 3' Affymetrix arrays.
>
>Some of the older or custom gene expression arrays which use
Illumina's
>universal BeadChips use the DASL assay, which uses a combination of
>poly(T) priming and random priming.
>
>
>However, if we are talking about gene expression arrays, both types
of
>protocol start with an RNA sample, so they will only amplify
transcripts
>that are expressed, whichever strand they are transcribed from.
>
>
>The problem with probes on the wrong strand is that you will not be
able
>to detect expression from the gene in question if the probe is on the
>opposite strand. If a probe that is on the wrong strand gives a
signal
>that indicates expression, then what is really being expressed is
possibly
>an antisense transcript which may have some regulatory function.
At least for the Affymetrix arrays, RT-PCR confirms that many of the
probes anti-sense to the gene of interest target anti-sense RNA
transcripts (See this article:
http://www.biomedcentral.com/1471-2164/8/200). Even in the (slightly)
newer human U133A/2.0 arrays, many probes are targeted to the wrong
strand. Given the wide use of these Affymetrix arrays, and the wide
availability of data in GEO and other repositories, this is
potentially a HUGE resource waiting for someone with a little bit of
extra time.
>
>Best wishes,
>Maria
Dear Cei, Steve,
There are two versions of the correct answer depending on whether we
are
talking about an expression or CGH/SNP type array;
If we are using an EXPRESSION array
1) It does not matter on which strand the gene resides.
2) It a not matter of bad probe design. It is either a negative
control or a
misnomer derived from genome annotation.
For ANY probe to hybridise it has to be the RC of cDNA and therefore
the DNA
homologue of the original RNA sequence. (I'll let you work that one
out for
yourself).
If the probe WAS encoded on "the opposite strand" your labelled target
would
not hybridise as it would be the reverse complement of the actual
sequence.
The annotation "opposite strand" stems from the convention that we
call one
strand the "coding strand" and the other strand the non-coding or
"opposite"
strand. By definition then a gene cannot be encoded by the "opposite"
strand.
However, what often happens when sequencing genomes is that we find
several
genes encoded on one strand (which we will then call the coding
strand) and
then somewhat later also one or more genes on the "opposite" strand.
This
annotation is (wrongly in my opinion) retained when genomes are
assembled
and thus part of the annotation of the probes.
So an opposite strand probe is at best a kind of negative control, at
worst
a misnomer annotation retained when the genome was assembled. Mostly
we now
try to use terms like + and - but even that has the drawback that we
generally associate + with coding and - with noncoding. As we all know
BOTH
strand encode functional RNAs of various kinds including those coding
for
proteins.....
If we are talking about DNA targets, e.g. a SNP array
1) It does not matter on which strand a gene resides, any overlap is a
matter of coincidence- "genes" are rare events on the genome.
2) It is not a matter of bad probe design. Usually it simply does not
matter
and this is a sequence that was used historically without knowledge of
the
gene (often discovered later). Sometimes the sequence on the coding
strand
may have a problem with background or sequence similarity. To get
around
this one can try to use the RC (i.e. "opposite strand" sequence) which
is
often different enough. Of course if more than 2 similar sequences
exist the
problem remains as we can use this trick only once.
Hope this helps,
Nick
N.V. Henriquez, Senior Research Associate
Dept. Of Neurodegenerative Diseases
Institute of Neurology, UCL,
Queen Square House rm 124
Queen Square
London WC1N 3BG
Message: 8
Date: Wed, 19 Nov 2008 10:45:52 -0500
From: Steve Lianoglou <mailinglist.honeypot@gmail.com>
Subject: Re: [BioC] Does the strand of a microarray probe matter?
To: Cei Abreu-Goodger <cei at="" ebi.ac.uk="">
Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Hi Cei,
On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
> Hello all,
>
> Related issues have arisen before, where the probe of a particular
> array platform was annotated to a gene on the opposite strand. But I
> was just asked if this even matters, or should it simply be
> considered a case of bad probe design.
>
> Does the protocol for different manufacturer's arrays always produce
> amplified product of both strands for the transcript to be measured?
> I could imagine that protocols that amplify based on poly-A tails
> would tend to produce an anti-sense biased amplification product
> (older Affy arrays?), whereas those based on random priming could
> produce products of both strands (and so the actual strand that is
> on the array becomes meaningless).
>
> Does someone know what is the case in particular for Illumina
> Beadarrays?
I've never worked on the bench-side of a microarray experiment, but
for gene expression arrays I was under the impression that most
protocols:
(i) extract the the RNA from cell lysate using their poly-A tails as
targets
(ii) reverse transcribe to cDNA and amplify the cDNA w/ random
primers.
(iii) hybridize amplified cDNA to the array
If that's the case, I don't think that the strand of the probe should
be an issue.
I'd be interested, of course, to hear other people's thoughts on this,
too (while this info should be easily available from the
manufacturer's site, or the Methods section of many papers, let's see
if the lazy-web can help :-).
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos
Hi Nick, and others,
Apologies for not making my question more clear, but I guess there
have
been some interesting answers anyway. I was in fact thinking of
expression arrays. And my main interest was from the standpoint of
probe
annotation.
It now does seem pretty clear that there are many regions in the
genome
that encode transcripts on both strands. If a probe is designed to
such
a region, the expression microarrays will be measuring both
transcripts,
and you will essentially have a "perfectly" cross-hybridizing probe.
Now, annotation-wise, what should we do? Ignore such probes? At least
flag them up? The problem is, many bioconductor annotation packages
only
allow a single gene to be assigned to each probe. So, in many cases
you
many be led to believe that your experiment has measured differential
expression for a particular gene (with its set of GO terms, KEGG
pathways, etc) when in fact the changing gene was the one on the other
strand.
These "problems" tend to show up on the list occasionally, for example
when people find out that different databases (Ensembl/Biomart, NCBI,
the manufacturer or a bioC annotation package) lists different genes
for
the same probe. Obviously not all, but many of these differences have
been due to overlapping transcripts. In fact, Ensembl recently patched
their probe mapping pipeline to be "strand-aware". If you think that
this would affect a tiny portion of probes, think again: the
Affymetrix
probes affected on the human and mouse genomes was around 10%:
http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.htm
l
Also, from talking to some of the NuID/Illumina mapping people it
seems
that they simply don't consider the strand of the probe. But they do
calculate a "uniqueness" score to avoid probes that map to multiple
genes.
In the end, I would ideally prefer "cross-hybridizing" probes (of
whatever sort) to be annotated in a way that they could be identified.
But I have no idea of how much a nightmare that would be for the
developers of the current annotation packages...
Many thanks,
Cei
Nick Henriquez wrote:
> Dear Cei, Steve,
>
> There are two versions of the correct answer depending on whether we
are
> talking about an expression or CGH/SNP type array;
>
> If we are using an EXPRESSION array
>
> 1) It does not matter on which strand the gene resides.
> 2) It a not matter of bad probe design. It is either a negative
control or a
> misnomer derived from genome annotation.
>
> For ANY probe to hybridise it has to be the RC of cDNA and therefore
the DNA
> homologue of the original RNA sequence. (I'll let you work that one
out for
> yourself).
>
> If the probe WAS encoded on "the opposite strand" your labelled
target would
> not hybridise as it would be the reverse complement of the actual
sequence.
>
> The annotation "opposite strand" stems from the convention that we
call one
> strand the "coding strand" and the other strand the non-coding or
"opposite"
> strand. By definition then a gene cannot be encoded by the
"opposite"
> strand.
>
> However, what often happens when sequencing genomes is that we find
several
> genes encoded on one strand (which we will then call the coding
strand) and
> then somewhat later also one or more genes on the "opposite" strand.
This
> annotation is (wrongly in my opinion) retained when genomes are
assembled
> and thus part of the annotation of the probes.
>
> So an opposite strand probe is at best a kind of negative control,
at worst
> a misnomer annotation retained when the genome was assembled. Mostly
we now
> try to use terms like + and - but even that has the drawback that we
> generally associate + with coding and - with noncoding. As we all
know BOTH
> strand encode functional RNAs of various kinds including those
coding for
> proteins.....
>
> If we are talking about DNA targets, e.g. a SNP array
>
> 1) It does not matter on which strand a gene resides, any overlap is
a
> matter of coincidence- "genes" are rare events on the genome.
> 2) It is not a matter of bad probe design. Usually it simply does
not matter
> and this is a sequence that was used historically without knowledge
of the
> gene (often discovered later). Sometimes the sequence on the coding
strand
> may have a problem with background or sequence similarity. To get
around
> this one can try to use the RC (i.e. "opposite strand" sequence)
which is
> often different enough. Of course if more than 2 similar sequences
exist the
> problem remains as we can use this trick only once.
>
> Hope this helps,
>
> Nick
>
> N.V. Henriquez, Senior Research Associate
> Dept. Of Neurodegenerative Diseases
> Institute of Neurology, UCL,
> Queen Square House rm 124
> Queen Square
> London WC1N 3BG
>
>
>
>
> Message: 8
> Date: Wed, 19 Nov 2008 10:45:52 -0500
> From: Steve Lianoglou <mailinglist.honeypot at="" gmail.com="">
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
> To: Cei Abreu-Goodger <cei at="" ebi.ac.uk="">
> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Hi Cei,
>
> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
>
>> Hello all,
>>
>> Related issues have arisen before, where the probe of a particular
>> array platform was annotated to a gene on the opposite strand. But
I
>> was just asked if this even matters, or should it simply be
>> considered a case of bad probe design.
>>
>> Does the protocol for different manufacturer's arrays always
produce
>> amplified product of both strands for the transcript to be
measured?
>> I could imagine that protocols that amplify based on poly-A tails
>> would tend to produce an anti-sense biased amplification product
>> (older Affy arrays?), whereas those based on random priming could
>> produce products of both strands (and so the actual strand that is
>> on the array becomes meaningless).
>>
>> Does someone know what is the case in particular for Illumina
>> Beadarrays?
>
>
> I've never worked on the bench-side of a microarray experiment, but
> for gene expression arrays I was under the impression that most
> protocols:
>
> (i) extract the the RNA from cell lysate using their poly-A tails as
> targets
> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random
primers.
> (iii) hybridize amplified cDNA to the array
>
> If that's the case, I don't think that the strand of the probe
should
> be an issue.
>
> I'd be interested, of course, to hear other people's thoughts on
this,
> too (while this info should be easily available from the
> manufacturer's site, or the Methods section of many papers, let's
see
> if the lazy-web can help :-).
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> http://cbio.mskcc.org/~lianos
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei@ebi.ac.uk>
wrote:
> Hi Nick, and others,
>
> Apologies for not making my question more clear, but I guess there
have
> been some interesting answers anyway. I was in fact thinking of
expression
> arrays. And my main interest was from the standpoint of probe
annotation.
>
> It now does seem pretty clear that there are many regions in the
genome
> that encode transcripts on both strands. If a probe is designed to
such a
> region, the expression microarrays will be measuring both
transcripts, and
> you will essentially have a "perfectly" cross-hybridizing probe.
>
Not really. It depends on the protocol being used. For illumina, you
will
end up with a product that goes on the array that is strand-specific.
That
is not true of all array platforms.
>
> Now, annotation-wise, what should we do? Ignore such probes? At
least flag
> them up? The problem is, many bioconductor annotation packages only
allow a
> single gene to be assigned to each probe. So, in many cases you many
be led
> to believe that your experiment has measured differential expression
for a
> particular gene (with its set of GO terms, KEGG pathways, etc) when
in fact
> the changing gene was the one on the other strand.
I don't think this comes up very often, but it is always possible that
for
any given gene there is another explanation for differential
expression as
observed. That is why for a given gene, it is important to validate
using a
different technology. Globally (as in sets of genes), it hopefully
won't be
too much a factor.
>
>
> These "problems" tend to show up on the list occasionally, for
example when
> people find out that different databases (Ensembl/Biomart, NCBI, the
> manufacturer or a bioC annotation package) lists different genes for
the
> same probe. Obviously not all, but many of these differences have
been due
> to overlapping transcripts. In fact, Ensembl recently patched their
probe
> mapping pipeline to be "strand-aware". If you think that this would
affect a
> tiny portion of probes, think again: the Affymetrix probes affected
on the
> human and mouse genomes was around 10%:
>
> http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.h
tml
>
> Also, from talking to some of the NuID/Illumina mapping people it
seems
> that they simply don't consider the strand of the probe. But they do
> calculate a "uniqueness" score to avoid probes that map to multiple
genes.
>
> In the end, I would ideally prefer "cross-hybridizing" probes (of
whatever
> sort) to be annotated in a way that they could be identified. But I
have no
> idea of how much a nightmare that would be for the developers of the
current
> annotation packages...
>
There is no attempt to map probes in bioconductor annotation packages
(at
least those maintained by the core). The annotation from which the
annotation packages are derived come directly from the manufacturers,
generally. Herve Pages just posted some code to the list that will
allow
you to align your own probes to the genome or, more probably, to a
transcript database of your choice. Then, you can use your own
definitions
for probes. I used to do this on a large scale for all arrays that we
used,
but I have backed away because the answers that one gets are very
similar
for the vast majority of probes.
Sean
>
> Nick Henriquez wrote:
>
>> Dear Cei, Steve,
>>
>> There are two versions of the correct answer depending on whether
we are
>> talking about an expression or CGH/SNP type array;
>>
>> If we are using an EXPRESSION array
>>
>> 1) It does not matter on which strand the gene resides.
>> 2) It a not matter of bad probe design. It is either a negative
control or
>> a
>> misnomer derived from genome annotation.
>>
>> For ANY probe to hybridise it has to be the RC of cDNA and
therefore the
>> DNA
>> homologue of the original RNA sequence. (I'll let you work that one
out
>> for
>> yourself).
>>
>> If the probe WAS encoded on "the opposite strand" your labelled
target
>> would
>> not hybridise as it would be the reverse complement of the actual
>> sequence.
>> The annotation "opposite strand" stems from the convention that we
call
>> one
>> strand the "coding strand" and the other strand the non-coding or
>> "opposite"
>> strand. By definition then a gene cannot be encoded by the
"opposite"
>> strand.
>> However, what often happens when sequencing genomes is that we find
>> several
>> genes encoded on one strand (which we will then call the coding
strand)
>> and
>> then somewhat later also one or more genes on the "opposite"
strand. This
>> annotation is (wrongly in my opinion) retained when genomes are
assembled
>> and thus part of the annotation of the probes.
>>
>> So an opposite strand probe is at best a kind of negative control,
at
>> worst
>> a misnomer annotation retained when the genome was assembled.
Mostly we
>> now
>> try to use terms like + and - but even that has the drawback that
we
>> generally associate + with coding and - with noncoding. As we all
know
>> BOTH
>> strand encode functional RNAs of various kinds including those
coding for
>> proteins.....
>>
>> If we are talking about DNA targets, e.g. a SNP array
>>
>> 1) It does not matter on which strand a gene resides, any overlap
is a
>> matter of coincidence- "genes" are rare events on the genome.
>> 2) It is not a matter of bad probe design. Usually it simply does
not
>> matter
>> and this is a sequence that was used historically without knowledge
of the
>> gene (often discovered later). Sometimes the sequence on the coding
strand
>> may have a problem with background or sequence similarity. To get
around
>> this one can try to use the RC (i.e. "opposite strand" sequence)
which is
>> often different enough. Of course if more than 2 similar sequences
exist
>> the
>> problem remains as we can use this trick only once.
>>
>> Hope this helps,
>>
>> Nick
>>
>> N.V. Henriquez, Senior Research Associate
>> Dept. Of Neurodegenerative Diseases
>> Institute of Neurology, UCL, Queen Square House rm 124
>> Queen Square
>> London WC1N 3BG
>>
>>
>>
>>
>> Message: 8
>> Date: Wed, 19 Nov 2008 10:45:52 -0500
>> From: Steve Lianoglou <mailinglist.honeypot@gmail.com>
>> Subject: Re: [BioC] Does the strand of a microarray probe matter?
>> To: Cei Abreu-Goodger <cei@ebi.ac.uk>
>> Cc: Bioconductor Newsgroup <bioconductor@stat.math.ethz.ch>
>> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C@gmail.com>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed;
delsp=yes
>>
>> Hi Cei,
>>
>> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
>>
>> Hello all,
>>>
>>> Related issues have arisen before, where the probe of a particular
array
>>> platform was annotated to a gene on the opposite strand. But I
was just
>>> asked if this even matters, or should it simply be considered a
case of bad
>>> probe design.
>>>
>>> Does the protocol for different manufacturer's arrays always
produce
>>> amplified product of both strands for the transcript to be
measured? I
>>> could imagine that protocols that amplify based on poly-A tails
would tend
>>> to produce an anti-sense biased amplification product (older Affy
arrays?),
>>> whereas those based on random priming could produce products of
both
>>> strands (and so the actual strand that is on the array becomes
>>> meaningless).
>>>
>>> Does someone know what is the case in particular for Illumina
>>> Beadarrays?
>>>
>>
>>
>> I've never worked on the bench-side of a microarray experiment, but
for
>> gene expression arrays I was under the impression that most
protocols:
>>
>> (i) extract the the RNA from cell lysate using their poly-A tails
as
>> targets
>> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random
primers.
>> (iii) hybridize amplified cDNA to the array
>>
>> If that's the case, I don't think that the strand of the probe
should be
>> an issue.
>>
>> I'd be interested, of course, to hear other people's thoughts on
this,
>> too (while this info should be easily available from the
manufacturer's
>> site, or the Methods section of many papers, let's see if the
lazy-web can
>> help :-).
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Physiology, Biophysics and Systems Biology
>> Weill Medical College of Cornell University
>>
>> http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos="">
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
Limited,
> a charity registered in England with number 1021457 and a company
registered
> in England with number 2742969, whose registered office is 215
Euston Road,
> London, NW1 2BE.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
And sorry for perhaps not making absolutely clear so to be completely
certain there is no misunderstanding about this;
Regardless of annotation, even if a piece of DNA encodes a gene on
both strands only ONE of these will hybridise to your probe. The
reverse-complement is NOT a perfect match, except in vanishingly rare
cases, i.e. palindromic sequences of restriction enzymes. These are
usually excluded from probe sets due to ambiguity/crosshybridising
potential. RC sequences are completely different and do not
crosshybridise with cDNA. Take any sequence (actgctgacag becomes
ctgtcagcagt) and you will see that and why this is the case.
Given that we know the sequence of the probe we can always tell from
which strand the hybridising cDNA is derived. So there is no doubt
whatsoever which gene was involved/altered in expression. If geneX is
on the "opposite strand" geneX was NOT the gene which was altered in
its expression, geneX is not detected by the probe in question. This
annotation introvertibly proves that geneX is not measured by this
probe. Therefore it was geneY encoded by the relevant strand of DNA.
You may have to figure out what geneY is depending on quality of
annotation but there are sufficient secondary databases to do that.
You may even discover a "new gene".
If 10% of genes may be affected, that means 10% of the genes in your
dataset. Usually we're not talking about thousands so it's fairly easy
to check. E.g. by looking for "encoded by" in the annotation etc. If
you use affy chips their expression console provides an
excel/openoffice compatible output which will allow this, even if
within R/BioC some of the annotated information might be lost. As long
as the "strand identity" annotation is retained you will always see
from BioC output whether geneX was in fact measured or not perhaps
code can be adjusted to ignore "other strand" annotations altogether,
I don't write code but it seems a relatively easy command to me,
whatever the correct syntax " probes with "other strand" in the
description=FALSE".
Best, Nick
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: 20 November 2008 22:51
To: Cei Abreu-Goodger
Cc: n.henriquez@ion.ucl.ac.uk; bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] Does the strand of a microarray probe matter?
On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei@ebi.ac.uk>
wrote:
Hi Nick, and others,
Apologies for not making my question more clear, but I guess there
have been some interesting answers anyway. I was in fact thinking of
expression arrays. And my main interest was from the standpoint of
probe annotation.
It now does seem pretty clear that there are many regions in the
genome that encode transcripts on both strands. If a probe is designed
to such a region, the expression microarrays will be measuring both
transcripts, and you will essentially have a "perfectly" cross-
hybridizing probe.
Not really. It depends on the protocol being used. For illumina, you
will end up with a product that goes on the array that is strand-
specific. That is not true of all array platforms.
Now, annotation-wise, what should we do? Ignore such probes? At least
flag them up? The problem is, many bioconductor annotation packages
only allow a single gene to be assigned to each probe. So, in many
cases you many be led to believe that your experiment has measured
differential expression for a particular gene (with its set of GO
terms, KEGG pathways, etc) when in fact the changing gene was the one
on the other strand.
I don't think this comes up very often, but it is always possible that
for any given gene there is another explanation for differential
expression as observed. That is why for a given gene, it is important
to validate using a different technology. Globally (as in sets of
genes), it hopefully won't be too much a factor.
These "problems" tend to show up on the list occasionally, for example
when people find out that different databases (Ensembl/Biomart, NCBI,
the manufacturer or a bioC annotation package) lists different genes
for the same probe. Obviously not all, but many of these differences
have been due to overlapping transcripts. In fact, Ensembl recently
patched their probe mapping pipeline to be "strand-aware". If you
think that this would affect a tiny portion of probes, think again:
the Affymetrix probes affected on the human and mouse genomes was
around 10%:
http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.htm
l
Also, from talking to some of the NuID/Illumina mapping people it
seems that they simply don't consider the strand of the probe. But
they do calculate a "uniqueness" score to avoid probes that map to
multiple genes.
In the end, I would ideally prefer "cross-hybridizing" probes (of
whatever sort) to be annotated in a way that they could be identified.
But I have no idea of how much a nightmare that would be for the
developers of the current annotation packages...
There is no attempt to map probes in bioconductor annotation packages
(at least those maintained by the core). The annotation from which
the annotation packages are derived come directly from the
manufacturers, generally. Herve Pages just posted some code to the
list that will allow you to align your own probes to the genome or,
more probably, to a transcript database of your choice. Then, you can
use your own definitions for probes. I used to do this on a large
scale for all arrays that we used, but I have backed away because the
answers that one gets are very similar for the vast majority of
probes.
Sean
Nick Henriquez wrote:
Dear Cei, Steve,
There are two versions of the correct answer depending on whether we
are
talking about an expression or CGH/SNP type array;
If we are using an EXPRESSION array
1) It does not matter on which strand the gene resides.
2) It a not matter of bad probe design. It is either a negative
control or a
misnomer derived from genome annotation.
For ANY probe to hybridise it has to be the RC of cDNA and therefore
the DNA
homologue of the original RNA sequence. (I'll let you work that one
out for
yourself).
If the probe WAS encoded on "the opposite strand" your labelled target
would
not hybridise as it would be the reverse complement of the actual
sequence.
The annotation "opposite strand" stems from the convention that we
call one
strand the "coding strand" and the other strand the non-coding or
"opposite"
strand. By definition then a gene cannot be encoded by the "opposite"
strand.
However, what often happens when sequencing genomes is that we find
several
genes encoded on one strand (which we will then call the coding
strand) and
then somewhat later also one or more genes on the "opposite" strand.
This
annotation is (wrongly in my opinion) retained when genomes are
assembled
and thus part of the annotation of the probes.
So an opposite strand probe is at best a kind of negative control, at
worst
a misnomer annotation retained when the genome was assembled. Mostly
we now
try to use terms like + and - but even that has the drawback that we
generally associate + with coding and - with noncoding. As we all know
BOTH
strand encode functional RNAs of various kinds including those coding
for
proteins.....
If we are talking about DNA targets, e.g. a SNP array
1) It does not matter on which strand a gene resides, any overlap is a
matter of coincidence- "genes" are rare events on the genome.
2) It is not a matter of bad probe design. Usually it simply does not
matter
and this is a sequence that was used historically without knowledge of
the
gene (often discovered later). Sometimes the sequence on the coding
strand
may have a problem with background or sequence similarity. To get
around
this one can try to use the RC (i.e. "opposite strand" sequence) which
is
often different enough. Of course if more than 2 similar sequences
exist the
problem remains as we can use this trick only once.
Hope this helps,
Nick
N.V. Henriquez, Senior Research Associate
Dept. Of Neurodegenerative Diseases
Institute of Neurology, UCL, Queen Square House rm 124
Queen Square
London WC1N 3BG
Message: 8
Date: Wed, 19 Nov 2008 10:45:52 -0500
From: Steve Lianoglou <mailinglist.honeypot@gmail.com>
Subject: Re: [BioC] Does the strand of a microarray probe matter?
To: Cei Abreu-Goodger <cei@ebi.ac.uk>
Cc: Bioconductor Newsgroup <bioconductor@stat.math.ethz.ch>
Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C@gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Hi Cei,
On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
Hello all,
Related issues have arisen before, where the probe of a particular
array platform was annotated to a gene on the opposite strand. But I
was just asked if this even matters, or should it simply be
considered a case of bad probe design.
Does the protocol for different manufacturer's arrays always produce
amplified product of both strands for the transcript to be measured?
I could imagine that protocols that amplify based on poly-A tails
would tend to produce an anti-sense biased amplification product
(older Affy arrays?), whereas those based on random priming could
produce products of both strands (and so the actual strand that is on
the array becomes meaningless).
Does someone know what is the case in particular for Illumina
Beadarrays?
I've never worked on the bench-side of a microarray experiment, but
for gene expression arrays I was under the impression that most
protocols:
(i) extract the the RNA from cell lysate using their poly-A tails as
targets
(ii) reverse transcribe to cDNA and amplify the cDNA w/ random
primers.
(iii) hybridize amplified cDNA to the array
If that's the case, I don't think that the strand of the probe should
be an issue.
I'd be interested, of course, to hear other people's thoughts on this,
too (while this info should be easily available from the
manufacturer's site, or the Methods section of many papers, let's see
if the lazy-web can help :-).
-steve
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos="">
_______________________________________________
Bioconductor mailing list
Bioconductor@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
offi...{{dropped:16}}
On Nov 21, 2008, at 19:30 , Nick Henriquez wrote:
> And sorry for perhaps not making absolutely clear so to be
> completely certain there is no misunderstanding about this;
>
>
>
> Regardless of annotation, even if a piece of DNA encodes a gene on
> both strands only ONE of these will hybridise to your probe. The
> reverse-complement is NOT a perfect match, except in vanishingly
> rare cases, i.e. palindromic sequences of restriction enzymes. These
> are usually excluded from probe sets due to ambiguity/
> crosshybridising potential. RC sequences are completely different
> and do not crosshybridise with cDNA. Take any sequence (actgctgacag
> becomes ctgtcagcagt) and you will see that and why this is the case.
>
>
> Given that we know the sequence of the probe we can always tell from
> which strand the hybridising cDNA is derived. So there is no doubt
> whatsoever which gene was involved/altered in expression. If geneX
> is on the "opposite strand" geneX was NOT the gene which was altered
> in its expression, geneX is not detected by the probe in question.
> This annotation introvertibly proves that geneX is not measured by
> this probe. Therefore it was geneY encoded by the relevant strand of
> DNA. You may have to figure out what geneY is depending on quality
> of annotation but there are sufficient secondary databases to do
> that. You may even discover a "new gene".
>
This is only true if the assay does not loose strandedness. Let us say
your assay involves making double stranded cDNA as eg. some high-
throughput sequencing does. In that case you have no way of telling
what strand your original material came from.
Kasper
>
>
> If 10% of genes may be affected, that means 10% of the genes in your
> dataset. Usually we're not talking about thousands so it's fairly
> easy to check. E.g. by looking for "encoded by" in the annotation
> etc. If you use affy chips their expression console provides an
> excel/openoffice compatible output which will allow this, even if
> within R/BioC some of the annotated information might be lost. As
> long as the "strand identity" annotation is retained you will always
> see from BioC output whether geneX was in fact measured or not
> perhaps code can be adjusted to ignore "other strand" annotations
> altogether, I don't write code but it seems a relatively easy
> command to me, whatever the correct syntax " probes with "other
> strand" in the description=FALSE".
>
>
>
> Best, Nick
>
>
>
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf
Of
> Sean Davis
> Sent: 20 November 2008 22:51
> To: Cei Abreu-Goodger
> Cc: n.henriquez at ion.ucl.ac.uk; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
>
>
>
>
>
> On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei at="" ebi.ac.uk="">
> wrote:
>
> Hi Nick, and others,
>
> Apologies for not making my question more clear, but I guess there
> have been some interesting answers anyway. I was in fact thinking of
> expression arrays. And my main interest was from the standpoint of
> probe annotation.
>
> It now does seem pretty clear that there are many regions in the
> genome that encode transcripts on both strands. If a probe is
> designed to such a region, the expression microarrays will be
> measuring both transcripts, and you will essentially have a
> "perfectly" cross-hybridizing probe.
>
>
> Not really. It depends on the protocol being used. For illumina,
> you will end up with a product that goes on the array that is
strand-
> specific. That is not true of all array platforms.
>
>
>
> Now, annotation-wise, what should we do? Ignore such probes? At
> least flag them up? The problem is, many bioconductor annotation
> packages only allow a single gene to be assigned to each probe. So,
> in many cases you many be led to believe that your experiment has
> measured differential expression for a particular gene (with its set
> of GO terms, KEGG pathways, etc) when in fact the changing gene was
> the one on the other strand.
>
>
> I don't think this comes up very often, but it is always possible
> that for any given gene there is another explanation for
> differential expression as observed. That is why for a given gene,
> it is important to validate using a different technology. Globally
> (as in sets of genes), it hopefully won't be too much a factor.
>
>
>
>
> These "problems" tend to show up on the list occasionally, for
> example when people find out that different databases (Ensembl/
> Biomart, NCBI, the manufacturer or a bioC annotation package) lists
> different genes for the same probe. Obviously not all, but many of
> these differences have been due to overlapping transcripts. In fact,
> Ensembl recently patched their probe mapping pipeline to be "strand-
> aware". If you think that this would affect a tiny portion of
> probes, think again: the Affymetrix probes affected on the human and
> mouse genomes was around 10%:
>
> http://osdir.com/ml/science.biology.ensembl.devel/2008-06/
> msg00052.html
>
> Also, from talking to some of the NuID/Illumina mapping people it
> seems that they simply don't consider the strand of the probe. But
> they do calculate a "uniqueness" score to avoid probes that map to
> multiple genes.
>
> In the end, I would ideally prefer "cross-hybridizing" probes (of
> whatever sort) to be annotated in a way that they could be
> identified. But I have no idea of how much a nightmare that would be
> for the developers of the current annotation packages...
>
>
> There is no attempt to map probes in bioconductor annotation
> packages (at least those maintained by the core). The annotation
> from which the annotation packages are derived come directly from
> the manufacturers, generally. Herve Pages just posted some code to
> the list that will allow you to align your own probes to the genome
> or, more probably, to a transcript database of your choice. Then,
> you can use your own definitions for probes. I used to do this on a
> large scale for all arrays that we used, but I have backed away
> because the answers that one gets are very similar for the vast
> majority of probes.
>
> Sean
>
>
>
>
> Nick Henriquez wrote:
>
> Dear Cei, Steve,
>
> There are two versions of the correct answer depending on whether we
> are
> talking about an expression or CGH/SNP type array;
>
> If we are using an EXPRESSION array
>
> 1) It does not matter on which strand the gene resides.
> 2) It a not matter of bad probe design. It is either a negative
> control or a
> misnomer derived from genome annotation.
>
> For ANY probe to hybridise it has to be the RC of cDNA and therefore
> the DNA
> homologue of the original RNA sequence. (I'll let you work that one
> out for
> yourself).
>
> If the probe WAS encoded on "the opposite strand" your labelled
> target would
> not hybridise as it would be the reverse complement of the actual
> sequence.
> The annotation "opposite strand" stems from the convention that we
> call one
> strand the "coding strand" and the other strand the non-coding or
> "opposite"
> strand. By definition then a gene cannot be encoded by the
"opposite"
> strand.
> However, what often happens when sequencing genomes is that we find
> several
> genes encoded on one strand (which we will then call the coding
> strand) and
> then somewhat later also one or more genes on the "opposite" strand.
> This
> annotation is (wrongly in my opinion) retained when genomes are
> assembled
> and thus part of the annotation of the probes.
>
> So an opposite strand probe is at best a kind of negative control,
> at worst
> a misnomer annotation retained when the genome was assembled. Mostly
> we now
> try to use terms like + and - but even that has the drawback that we
> generally associate + with coding and - with noncoding. As we all
> know BOTH
> strand encode functional RNAs of various kinds including those
> coding for
> proteins.....
>
> If we are talking about DNA targets, e.g. a SNP array
>
> 1) It does not matter on which strand a gene resides, any overlap is
a
> matter of coincidence- "genes" are rare events on the genome.
> 2) It is not a matter of bad probe design. Usually it simply does
> not matter
> and this is a sequence that was used historically without knowledge
> of the
> gene (often discovered later). Sometimes the sequence on the coding
> strand
> may have a problem with background or sequence similarity. To get
> around
> this one can try to use the RC (i.e. "opposite strand" sequence)
> which is
> often different enough. Of course if more than 2 similar sequences
> exist the
> problem remains as we can use this trick only once.
>
> Hope this helps,
>
> Nick
>
> N.V. Henriquez, Senior Research Associate
> Dept. Of Neurodegenerative Diseases
> Institute of Neurology, UCL, Queen Square House rm 124
> Queen Square
> London WC1N 3BG
>
>
>
>
> Message: 8
> Date: Wed, 19 Nov 2008 10:45:52 -0500
> From: Steve Lianoglou <mailinglist.honeypot at="" gmail.com="">
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
> To: Cei Abreu-Goodger <cei at="" ebi.ac.uk="">
> Cc: Bioconductor Newsgroup <bioconductor at="" stat.math.ethz.ch="">
> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Hi Cei,
>
> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
>
> Hello all,
>
> Related issues have arisen before, where the probe of a particular
> array platform was annotated to a gene on the opposite strand. But
> I was just asked if this even matters, or should it simply be
> considered a case of bad probe design.
>
> Does the protocol for different manufacturer's arrays always
> produce amplified product of both strands for the transcript to be
> measured? I could imagine that protocols that amplify based on
poly-
> A tails would tend to produce an anti-sense biased amplification
> product (older Affy arrays?), whereas those based on random priming
> could produce products of both strands (and so the actual strand
> that is on the array becomes meaningless).
>
> Does someone know what is the case in particular for Illumina
> Beadarrays?
>
>
>
> I've never worked on the bench-side of a microarray experiment, but
> for gene expression arrays I was under the impression that most
> protocols:
>
> (i) extract the the RNA from cell lysate using their poly-A tails
> as targets
> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random
> primers.
> (iii) hybridize amplified cDNA to the array
>
> If that's the case, I don't think that the strand of the probe
> should be an issue.
>
> I'd be interested, of course, to hear other people's thoughts on
> this, too (while this info should be easily available from the
> manufacturer's site, or the Methods section of many papers, let's
> see if the lazy-web can help :-).
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> http://cbio.mskcc.org/~lianos <http: cbio.mskcc.org="" %7elianos="">
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> offi...{{dropped:16}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Cei,
I think you may still be a little confused about what the gene
expression
arrays are designed to measure.
The probes on the Affymetrix 3' expression arrays are single-stranded
oligos having the sense sequence, i.e. the same sequence as the mRNA
they are designed
to detect, and the IVT assay used with these arrays produces single
stranded cRNAs with the antisense sequence, i.e. the reverse
complement
of the initial mRNA sample.
The probes on the Affymetrix Exon ST and Gene ST arrays have the
antisense sequence, i.e. the reverse complement of the sequence they
are
designed to detect, and the WT assay used with these arrays produces a
single-stranded cDNA with the sense sequence, i.e. the same sequence
as the initial mRNA sample.
The probesets on these arrays are only designed to measure
expression from one strand.
They will only measure expression from both strands in the cases where
Affymetrix have tiled probesets on both strands in the same region
of the genome. This is the case for some probesets that were designed
based on ESTS, where it wasn't clear which strand the gene was on at
the
time of array design, so probesets were tiled on both strands in the
region the EST mapped to.
As for problems with probeset annotations or discrepancies between one
annotation source and another, we have also found that
the number of annotation errors is probably somewhere close to 10%,
and that genes that were close together or had overlapping ends tended
to cause problems for the annotations.
Affymetrix grades the reliability of the annotations for the 3'
expression
arrays as A, B, C or E for each probeset, with A being the most
reliable
and E being annotations based on EST clusters and generally the least
reliable. We have found that although their A and B grade annotations
are
not always correct either, they are indeed more likely to be correct
than
the annotations they label as E.
For the exon arrays, Affymetrix labels its probesets as unique,
similar,
or mixed depending on whether or to what extent the probes
cross-hybridise, so that one can choose to use only those probesets
labelled as unique if one wants to avoid cross-hybridising probes.
(I haven't done any mappings of the probes on the exon arrays yet, so
I
don't know how true this is.)
Best wishes,
Maria
Hi Cei
this paper contains a discussion of this topic.
Antisense artifacts in transcriptome microarray experiments are
resolved
by actinomycin D, by Perocchi et al.:
http://nar.oxfordjournals.org/cgi/content/full/35/19/e128
Fig.1 and Fig.2 show that you can get strand specific measurements,
but
that spurious second-strand synthesis by the reverse transcription
step
needs to be considered / avoided.
Best wishes
Wolfgang
----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
Maria Stalteri ha scritto:
> Hi Cei,
>
> I think you may still be a little confused about what the gene
expression
> arrays are designed to measure.
>
> The probes on the Affymetrix 3' expression arrays are single-
stranded
> oligos having the sense sequence, i.e. the same sequence as the mRNA
they are designed
> to detect, and the IVT assay used with these arrays produces single
> stranded cRNAs with the antisense sequence, i.e. the reverse
complement
> of the initial mRNA sample.
>
> The probes on the Affymetrix Exon ST and Gene ST arrays have the
> antisense sequence, i.e. the reverse complement of the sequence they
are
> designed to detect, and the WT assay used with these arrays produces
a
> single-stranded cDNA with the sense sequence, i.e. the same sequence
> as the initial mRNA sample.
>
> The probesets on these arrays are only designed to measure
> expression from one strand.
>
> They will only measure expression from both strands in the cases
where
> Affymetrix have tiled probesets on both strands in the same region
> of the genome. This is the case for some probesets that were
designed
> based on ESTS, where it wasn't clear which strand the gene was on at
the
> time of array design, so probesets were tiled on both strands in the
> region the EST mapped to.
>
>
> As for problems with probeset annotations or discrepancies between
one
> annotation source and another, we have also found that
> the number of annotation errors is probably somewhere close to 10%,
> and that genes that were close together or had overlapping ends
tended
> to cause problems for the annotations.
>
> Affymetrix grades the reliability of the annotations for the 3'
expression
> arrays as A, B, C or E for each probeset, with A being the most
reliable
> and E being annotations based on EST clusters and generally the
least
> reliable. We have found that although their A and B grade
annotations are
> not always correct either, they are indeed more likely to be correct
than
> the annotations they label as E.
>
> For the exon arrays, Affymetrix labels its probesets as unique,
similar,
> or mixed depending on whether or to what extent the probes
> cross-hybridise, so that one can choose to use only those probesets
> labelled as unique if one wants to avoid cross-hybridising probes.
> (I haven't done any mappings of the probes on the exon arrays yet,
so I
> don't know how true this is.)
>
> Best wishes,
> Maria
>
> _______________________________________________
Dear list,
Thank you all for your answers, from these and some offline
conversations I had with people from the microarray facility, I can
see
that current microarray protocols attempt to produce strand specific
samples before hybridizing (but see the ref. Wolfgang sent).
In this case, whenever doing probe mapping we have to be careful to
select only those probes with sequence matching on the appropriate
strand (and this will depend on the platform, since some manufacturers
report the probe sequence, and some the "target" sequence). As I
mentioned before, this has historically not always been the case.
One last point, regarding one of Sean's answers:
"
There is no attempt to map probes in bioconductor annotation packages
(at least those maintained by the core). The annotation from which
the
annotation packages are derived come directly from the manufacturers,
generally.
"
Even if no re-mapping is being done (there are many bioC packages not
maintained by the core which do involve re-mapping), my main point was
that bioconductor annotation structures don't allow more than one
"gene"
to be annotated for any particular probe. Do correct me if I'm wrong,
but at least when using AnnotationDbi I found no way of having more
than
one gene (EntrezID) per probe.
Another example: Affymetrix does annotate more than one gene
(EntrezID)
for their probes (~5% of probes in mouse430_2 with EntrezID have more
than one). So, I guess if the bioconductor core team is using the
manufacturer's annotation, then they are (in some way) removing this
information?
# bit of R code showing this:
library(mouse4302.db)
xx <- as.list(mouse4302ENTREZID)
any(lapply(xx, length) > 1)
#[1] FALSE
And no, I'm not saying that different EntrezID's are always unrelated
genes, or that multiple probes mapping to multiple genes are always
due
to strand problems.
Thanks again,
Cei
On Fri, Nov 28, 2008 at 4:44 AM, Cei Abreu-Goodger <cei@ebi.ac.uk>
wrote:
> Dear list,
>
> Thank you all for your answers, from these and some offline
conversations I
> had with people from the microarray facility, I can see that current
> microarray protocols attempt to produce strand specific samples
before
> hybridizing (but see the ref. Wolfgang sent).
>
> In this case, whenever doing probe mapping we have to be careful to
select
> only those probes with sequence matching on the appropriate strand
(and this
> will depend on the platform, since some manufacturers report the
probe
> sequence, and some the "target" sequence). As I mentioned before,
this has
> historically not always been the case.
>
> One last point, regarding one of Sean's answers:
>
> "
> There is no attempt to map probes in bioconductor annotation
packages (at
> least those maintained by the core). The annotation from which the
> annotation packages are derived come directly from the
manufacturers,
> generally.
> "
>
> Even if no re-mapping is being done (there are many bioC packages
not
> maintained by the core which do involve re-mapping), my main point
was that
> bioconductor annotation structures don't allow more than one "gene"
to be
> annotated for any particular probe. Do correct me if I'm wrong, but
at least
> when using AnnotationDbi I found no way of having more than one gene
> (EntrezID) per probe.
>
> Another example: Affymetrix does annotate more than one gene
(EntrezID) for
> their probes (~5% of probes in mouse430_2 with EntrezID have more
than one).
> So, I guess if the bioconductor core team is using the
manufacturer's
> annotation, then they are (in some way) removing this information?
>
I think the annotation is done by mapping the genbank or refseq ids to
entrez id. The Entrez IDs from the manufacturer are not used
directly, I do
not think. However, Marc Carlson is the best to comment on the
details.
Sean
>
> # bit of R code showing this:
> library(mouse4302.db)
> xx <- as.list(mouse4302ENTREZID)
> any(lapply(xx, length) > 1)
> #[1] FALSE
>
> And no, I'm not saying that different EntrezID's are always
unrelated
> genes, or that multiple probes mapping to multiple genes are always
due to
> strand problems.
>
> Thanks again,
>
> Cei
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]