Entering edit mode
To follow up slightly
On Tue, Nov 18, 2008 at 9:57 AM, Marc Carlson <mcarlson@fhcrc.org>
wrote:
> Hi Peter,
>
> I think that your confusion is coming from the fact that these are
the
> chromosome start locations for the genes and not the probes.
According
> to Affy, that probe is supposed to be measuring that gene and we
took
> their word for that. We then gave you the start positions for
> transcripts of that gene according to UCSC. We don't currently
provide
> the data for where the probe aligns to the genome or to which
> transcripts in the genome the probe might stick to.
You can easily find all genomic regions using Biostrings, and this
is one
of the examples in the vignette, I believe.
Finding all transcripts is harder (at least in the sense that we
have not
yet developed a pipeline for it). You would need to download all the
transcripts sequences from somewhere (RefSeq?), and then basically
modify
the example in the Biostrings vignette to do the matching.
These are not particularly large or hard problems, so a few hours
would
deal with the first, maybe a day or two for the second.
best wishes
Robert
>
>
>
> Marc
>
>
>
>
> Bazeley, Peter wrote:
> > Hello,
> >
> > R version: 2.8.0
> >
> > I just installed the hgu133plus2.db package, and am looking at the
> hgu133plus2CHRLOC environment. I've noticed that some of the
probeset
> entries (e.g. "201268_at") have multiple locations compared to
Affy's
> annotation file. I'd like to figure out if these multiple locations
are
> current, in which case it is some sort of overlapping/repeating
duplication.
> For example:
> >
> >
> >> as.list(hgu133plus2CHRLOC)$'201268_at'
> >>
> > 17 17 17 17
> > 46598879 46597889 46598637 46599081
> >
> > indicates that the probeset maps to 4 locations. Compare this to
the
> alignments info in the Affy's annotation file (from 7/8/08,
> http://www.affymetrix.com/Auth/analysis/downloads/na26/ivt/HG-
U133_Plus_2.na26.annot.csv.zip
> ):
> >
> > chr12:119204403-119205041 (+) // 91.49 // q24.31 ///
> chr17:46598810-46604103 (+) // 96.87 // q21.33
> >
> > which suggests one location on chromosome 17 (I'm ignoring
chromosome 12
> for now). This is a "_at" probeset, so it should map uniquely to a
sequence,
> according to Affy's "Data Analysis Fundamentals" document (and
speaking to a
> rep).
> >
> > >From the information provided by "?hgu133plus2CHRLOC", I
downloaded
> >
> ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens
/database/affyU133Plus2.txt.gz
> > from UCSC to see how this occured, but it is not clear. Actually,
the
> file:
> >
> http://www.affymetrix.com/Auth/analysis/downloads/psl/HG-
U133_Plus_2.link.psl.zip
> > from Affy's support page has the same alignment info. Here's the
relevant
> PSL info:
> >
> > Target sequence name: chr17
> > Alignment start position in target: 46598810
> > Alignment end position in target: 46604103
> > Number of blocks in the alignment (a block contains no gaps): 5
> > Comma-separated list of sizes of each block: 47,130,102,113,257,
> > Comma-separated list of starting positions of each block in
target:
> 46598810,46599186,46600601,46602296,46603846,
> >
> >
> > The second location provided by CHRLOC (46597889) occurs before
the start
> of the alignment in the PSL info, so perhaps this one CHRLOC
location
> corresponds to the PSL alignment? The mappings were obtained from
UCSC on
> 2006-Apr14, so perhaps additional alignments existed at the time,
which have
> since been removed.
> >
> >
> > Thank you for any help. Hopefully I'm just missing something
obvious
> (well, non-obvious for me).
> >
> > Peter Bazeley
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem@fhcrc.org
[[alternative HTML version deleted]]