Probeset/Transcript cluster definitions for HTA2.0 using pdInfoBuilder

0

Entering edit mode

Guilherme Rocha ▴ 40

@guilherme-rocha-6354

Last seen 8.0 years ago

Hi all, I have constructed a package information file for Affy's HTA 2.0 chip using pdInfoBuilder as shown below. It appears that the annotation files have been upgraded to na34 (from na33 in probeFile and transFile). Specific question: do the annotation files affect which probes are included in each probeset/trascript cluster? Broader question: what information from the annotation files is actually used by pdInfoBuider? Any help appreciated. Thanks, Guilherme Rocha ---------------------------------------------------------------------- ---------------------------------------------------------------------- ------------------------------- Construction fo the package: library(pdInfoBuilder) setwd("/my_bioc_packages/") seed <- new("AffyHTAPDInfoPkgSeed", version = "3.8.0", license = "Artistic-2.0", pgfFile = ".../HTA-2_0.r1.pgf", clfFile = ".../HTA-2_0.r1.clf", probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", coreMps = ".../HTA-2_0.r1.Psrs.mps", geneArray = TRUE, author = "gvrocha", email = "gvrocha at gmail.com", biocViews = "AnnotationData", genomebuild = "hg19", organism = "Homo sapiens", species = "Homo sapien", url = "http://about.me/gvrocha") makePdInfoPackage(seed, destDir=".") -- Guilherme V. Rocha gvrocha at gmail.com [[alternative HTML version deleted]]

BiocViews Annotation Organism biocViews pdInfoBuilder BiocViews BiocViews Annotation • 2.8k views

ADD COMMENT • link updated 10.7 years ago by James W. MacDonald 68k • written 10.7 years ago by Guilherme Rocha ▴ 40

0

Entering edit mode

If you are thinking of using the na34 version of the Affy probeset annotation files (".../HTA-2_0.na34.hg19.probeset.csv"), notice that in that file, 2995 probesets are identified by their NUMERICAL id whereas the remaining probesets are identified by their ALPHANUMERICAL ids.

GVR

ADD REPLY • link 10.2 years ago Guilherme Rocha ▴ 40

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Guilherme, On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at="" gmail.com=""> wrote: > Hi all, > > I have constructed a package information file for Affy's HTA 2.0 chip > using pdInfoBuilder as shown below. > It appears that the annotation files have been upgraded to na34 (from > na33 in probeFile and transFile). > > Specific question: do the annotation files affect which probes are > included in each probeset/trascript cluster? > They can. It depends on changes between the current genome build and the one on which the original probeset/transcript clusters were based. Given the maturity of the Human Genome, I wouldn't expect massive changes. > Broader question: what information from the annotation files is actually > used by pdInfoBuider? > This is something you could explore for yourself. If you go to the svn ( https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using readonly for both the password and user name, and look at the source for pdBuilderV2HTA2.R, you can see this near the top, in the function parseHtaProbesetCSV(): cols <- c("probeset_id", "seqname", "strand", "start", "stop", "transcript_cluster_id", "exon_id", "crosshyb_type", "level", "probeset_type", "junction_start_edge", "junction_stop_edge", "junction_sequence", "has_cds") So all of this information is parsed out of the probeset CSV file. If there are changes to the current human genome that would imply that a particular probe or probeset no longer measures what Affy originally intended (or if the strand, start, or stop position change), then the changes would be reflected here, and would then be passed to the pd.hta.2.0 package that you built. The transcript CSV file is used for much less. AFAIK, that file is just parsed and put into the extdata directory of the package: ###################################################################### # ## Part vi) Save NetAffx Annotation to extdata ###################################################################### # if (!quiet) message("Saving NetAffx Annotation... ", appendLF=FALSE) netaffxProbeset <- annot2fdata(object at probeFile) save(netaffxProbeset, file=file.path(extdataDir, 'netaffxProbeset.rda'), compress='xz') netaffxTranscript <- annot2fdata(object at transFile) save(netaffxTranscript, file=file.path(extdataDir, 'netaffxTranscript.rda'), compress='xz') And you can see what that looks like by doing: load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda")) and then head(pData(netaffxTranscript)) but I don't think these data are currently used for anything. Best, Jim > > Any help appreciated. > > Thanks, > > Guilherme Rocha > > > > -------------------------------------------------------------------- ---------------------------------------------------------------------- --------------------------------- > Construction fo the package: > > library(pdInfoBuilder) > > setwd("/my_bioc_packages/") > > seed <- new("AffyHTAPDInfoPkgSeed", > version = "3.8.0", > license = "Artistic-2.0", > pgfFile = ".../HTA-2_0.r1.pgf", > clfFile = ".../HTA-2_0.r1.clf", > probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", > transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", > coreMps = ".../HTA-2_0.r1.Psrs.mps", > geneArray = TRUE, > author = "gvrocha", > email = "gvrocha at gmail.com", > biocViews = "AnnotationData", > genomebuild = "hg19", > organism = "Homo sapiens", > species = "Homo sapien", > url = "http://about.me/gvrocha") > > makePdInfoPackage(seed, destDir=".") > > > -- > Guilherme V. Rocha > gvrocha at gmail.com > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 [[alternative HTML version deleted]]

ADD COMMENT • link 10.7 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you. Your reply helps a lot in letting me know where to look for things. :) Best, G On Wed, Aug 27, 2014 at 11:08 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Guilherme, > > > On Tue, Aug 26, 2014 at 10:00 AM, Guilherme Rocha <gvrocha at="" gmail.com=""> > wrote: > >> Hi all, >> >> I have constructed a package information file for Affy's HTA 2.0 chip >> using pdInfoBuilder as shown below. >> It appears that the annotation files have been upgraded to na34 (from >> na33 in probeFile and transFile). >> >> Specific question: do the annotation files affect which probes are >> included in each probeset/trascript cluster? >> > > They can. It depends on changes between the current genome build and the > one on which the original probeset/transcript clusters were based. Given > the maturity of the Human Genome, I wouldn't expect massive changes. > > >> Broader question: what information from the annotation files is actually >> used by pdInfoBuider? >> > > This is something you could explore for yourself. If you go to the svn ( > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks), using > readonly for both the password and user name, and look at the source for > pdBuilderV2HTA2.R, you can see this near the top, in the function > parseHtaProbesetCSV(): > > > cols <- c("probeset_id", "seqname", "strand", "start", "stop", > "transcript_cluster_id", "exon_id", > "crosshyb_type", "level", "probeset_type", > "junction_start_edge", "junction_stop_edge", > "junction_sequence", "has_cds") > > So all of this information is parsed out of the probeset CSV file. If > there are changes to the current human genome that would imply that a > particular probe or probeset no longer measures what Affy originally > intended (or if the strand, start, or stop position change), then the > changes would be reflected here, and would then be passed to the pd.hta.2.0 > package that you built. > > The transcript CSV file is used for much less. AFAIK, that file is just > parsed and put into the extdata directory of the package: > > > #################################################################### ### > ## Part vi) Save NetAffx Annotation to extdata > > #################################################################### ### > if (!quiet) message("Saving NetAffx Annotation... ", > appendLF=FALSE) > netaffxProbeset <- annot2fdata(object at probeFile) > save(netaffxProbeset, file=file.path(extdataDir, > 'netaffxProbeset.rda'), compress='xz') > netaffxTranscript <- annot2fdata(object at transFile) > save(netaffxTranscript, file=file.path(extdataDir, > 'netaffxTranscript.rda'), > compress='xz') > > And you can see what that looks like by doing: > > load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda")) > > and then > > head(pData(netaffxTranscript)) > > but I don't think these data are currently used for anything. > > Best, > > Jim > > > > >> >> Any help appreciated. >> >> Thanks, >> >> Guilherme Rocha >> >> >> >> ------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------- >> Construction fo the package: >> >> library(pdInfoBuilder) >> >> setwd("/my_bioc_packages/") >> >> seed <- new("AffyHTAPDInfoPkgSeed", >> version = "3.8.0", >> license = "Artistic-2.0", >> pgfFile = ".../HTA-2_0.r1.pgf", >> clfFile = ".../HTA-2_0.r1.clf", >> probeFile = ".../HTA-2_0.na33.hg19.probeset.csv", >> transFile = ".../HTA-2_0.na33.1.hg19.transcript.csv", >> coreMps = ".../HTA-2_0.r1.Psrs.mps", >> geneArray = TRUE, >> author = "gvrocha", >> email = "gvrocha at gmail.com", >> biocViews = "AnnotationData", >> genomebuild = "hg19", >> organism = "Homo sapiens", >> species = "Homo sapien", >> url = "http://about.me/gvrocha") >> >> makePdInfoPackage(seed, destDir=".") >> >> >> -- >> Guilherme V. Rocha >> gvrocha at gmail.com >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- Guilherme V. Rocha gvrocha at gmail.com [[alternative HTML version deleted]]

ADD REPLY • link 10.7 years ago Guilherme Rocha ▴ 40

Login before adding your answer.