Probeset/Transcript cluster definitions for HTA2.0 using pdInfoBuilder
1
0
Entering edit mode
@guilherme-rocha-6354
Last seen 7.7 years ago
Hi all, I have constructed a package information file for Affy's HTA 2.0 chip using pdInfoBuilder as shown below. It appears that the annotation files have been upgraded to na34 (from na33 in probeFile and transFile). Specific question: do the annotation files affect which probes are included in each probeset/trascript cluster? Broader question: what information from the annotation files is actually used by pdInfoBuider? Any help appreciated. Thanks, Guilherme Rocha ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------- Construction fo the package: library(pdInfoBuilder) setwd("/my_bioc_packages/") seed <- new("AffyHTAPDInfoPkgSeed", version = "3.8.0", license = "Artistic-2.0", pgfFile = ".../HTA-2_0.r1.pgf", clfFile = ".../HTA-2_0.r1.clf", probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", coreMps = ".../HTA-2_0.r1.Psrs.mps", geneArray = TRUE, author = "gvrocha", email = "gvrocha at gmail.com", biocViews = "AnnotationData", genomebuild = "hg19", organism = "Homo sapiens", species = "Homo sapien", url = "http://about.me/gvrocha") makePdInfoPackage(seed, destDir=".") -- Guilherme V. Rocha gvrocha at gmail.com [[alternative HTML version deleted]]
BiocViews Annotation Organism biocViews pdInfoBuilder BiocViews BiocViews Annotation • 2.7k views
ADD COMMENT
0
Entering edit mode

If you are thinking of using the na34 version of the Affy probeset annotation files (".../HTA-2_0.na34.hg19.probeset.csv"), notice that in that file, 2995 probesets are identified by their NUMERICAL id whereas the remaining probesets are identified by their ALPHANUMERICAL ids.

GVR

 

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States
Hi Guilherme, On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at="" gmail.com=""> wrote: > Hi all, > > I have constructed a package information file for Affy's HTA 2.0 chip > using pdInfoBuilder as shown below. > It appears that the annotation files have been upgraded to na34 (from > na33 in probeFile and transFile). > > Specific question: do the annotation files affect which probes are > included in each probeset/trascript cluster? > They can. It depends on changes between the current genome build and the one on which the original probeset/transcript clusters were based. Given the maturity of the Human Genome, I wouldn't expect massive changes. > Broader question: what information from the annotation files is actually > used by pdInfoBuider? > This is something you could explore for yourself. If you go to the svn ( https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using readonly for both the password and user name, and look at the source for pdBuilderV2HTA2.R, you can see this near the top, in the function parseHtaProbesetCSV(): cols <- c("probeset_id", "seqname", "strand", "start", "stop", "transcript_cluster_id", "exon_id", "crosshyb_type", "level", "probeset_type", "junction_start_edge", "junction_stop_edge", "junction_sequence", "has_cds") So all of this information is parsed out of the probeset CSV file. If there are changes to the current human genome that would imply that a particular probe or probeset no longer measures what Affy originally intended (or if the strand, start, or stop position change), then the changes would be reflected here, and would then be passed to the pd.hta.2.0 package that you built. The transcript CSV file is used for much less. AFAIK, that file is just parsed and put into the extdata directory of the package: ###################################################################### # ## Part vi) Save NetAffx Annotation to extdata ###################################################################### # if (!quiet) message("Saving NetAffx Annotation... ", appendLF=FALSE) netaffxProbeset <- annot2fdata(object at probeFile) save(netaffxProbeset, file=file.path(extdataDir, 'netaffxProbeset.rda'), compress='xz') netaffxTranscript <- annot2fdata(object at transFile) save(netaffxTranscript, file=file.path(extdataDir, 'netaffxTranscript.rda'), compress='xz') And you can see what that looks like by doing: load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda")) and then head(pData(netaffxTranscript)) but I don't think these data are currently used for anything. Best, Jim > > Any help appreciated. > > Thanks, > > Guilherme Rocha > > > > -------------------------------------------------------------------- ---------------------------------------------------------------------- --------------------------------- > Construction fo the package: > > library(pdInfoBuilder) > > setwd("/my_bioc_packages/") > > seed <- new("AffyHTAPDInfoPkgSeed", > version = "3.8.0", > license = "Artistic-2.0", > pgfFile = ".../HTA-2_0.r1.pgf", > clfFile = ".../HTA-2_0.r1.clf", > probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", > transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", > coreMps = ".../HTA-2_0.r1.Psrs.mps", > geneArray = TRUE, > author = "gvrocha", > email = "gvrocha at gmail.com", > biocViews = "AnnotationData", > genomebuild = "hg19", > organism = "Homo sapiens", > species = "Homo sapien", > url = "http://about.me/gvrocha") > > makePdInfoPackage(seed, destDir=".") > > > -- > Guilherme V. Rocha > gvrocha at gmail.com > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thank you. Your reply helps a lot in letting me know where to look for things. :) Best, G On Wed, Aug 27, 2014 at 11:08 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Guilherme, > > > On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at="" gmail.com=""> > wrote: > >> Hi all, >> >> I have constructed a package information file for Affy's HTA 2.0 chip >> using pdInfoBuilder as shown below. >> It appears that the annotation files have been upgraded to na34 (from >> na33 in probeFile and transFile). >> >> Specific question: do the annotation files affect which probes are >> included in each probeset/trascript cluster? >> > > They can. It depends on changes between the current genome build and the > one on which the original probeset/transcript clusters were based. Given > the maturity of the Human Genome, I wouldn't expect massive changes. > > >> Broader question: what information from the annotation files is actually >> used by pdInfoBuider? >> > > This is something you could explore for yourself. If you go to the svn ( > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using > readonly for both the password and user name, and look at the source for > pdBuilderV2HTA2.R, you can see this near the top, in the function > parseHtaProbesetCSV(): > > > cols <- c("probeset_id", "seqname", "strand", "start", "stop", > "transcript_cluster_id", "exon_id", > "crosshyb_type", "level", "probeset_type", > "junction_start_edge", "junction_stop_edge", > "junction_sequence", "has_cds") > > So all of this information is parsed out of the probeset CSV file. If > there are changes to the current human genome that would imply that a > particular probe or probeset no longer measures what Affy originally > intended (or if the strand, start, or stop position change), then the > changes would be reflected here, and would then be passed to the pd.hta.2.0 > package that you built. > > The transcript CSV file is used for much less. AFAIK, that file is just > parsed and put into the extdata directory of the package: > > > #################################################################### ### > ## Part vi) Save NetAffx Annotation to extdata > > #################################################################### ### > if (!quiet) message("Saving NetAffx Annotation... ", > appendLF=FALSE) > netaffxProbeset <- annot2fdata(object at probeFile) > save(netaffxProbeset, file=file.path(extdataDir, > 'netaffxProbeset.rda'), compress='xz') > netaffxTranscript <- annot2fdata(object at transFile) > save(netaffxTranscript, file=file.path(extdataDir, > 'netaffxTranscript.rda'), > compress='xz') > > And you can see what that looks like by doing: > > load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda")) > > and then > > head(pData(netaffxTranscript)) > > but I don't think these data are currently used for anything. > > Best, > > Jim > > > > >> >> Any help appreciated. >> >> Thanks, >> >> Guilherme Rocha >> >> >> >> ------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------- >> Construction fo the package: >> >> library(pdInfoBuilder) >> >> setwd("/my_bioc_packages/") >> >> seed <- new("AffyHTAPDInfoPkgSeed", >> version = "3.8.0", >> license = "Artistic-2.0", >> pgfFile = ".../HTA-2_0.r1.pgf", >> clfFile = ".../HTA-2_0.r1.clf", >> probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", >> transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", >> coreMps = ".../HTA-2_0.r1.Psrs.mps", >> geneArray = TRUE, >> author = "gvrocha", >> email = "gvrocha at gmail.com", >> biocViews = "AnnotationData", >> genomebuild = "hg19", >> organism = "Homo sapiens", >> species = "Homo sapien", >> url = "http://about.me/gvrocha") >> >> makePdInfoPackage(seed, destDir=".") >> >> >> -- >> Guilherme V. Rocha >> gvrocha at gmail.com >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- Guilherme V. Rocha gvrocha at gmail.com [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6