how to extract probes for a probeset from PdInfo database?

0

Entering edit mode

Guido Hooiveld ★ 4.1k

@guido-hooiveld-2020

Last seen 23 days ago

Wageningen University, Wageningen, the …

Hello, I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints. To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX- BkGr17-GC11_st. This is my approach: > library(pd.mirna.3.1) > con <- db(pd.mirna.3.1) > affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st") > affy.probesets [1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st" > > #check available tables/information > dbGetQuery(con, "select name, sql from sqlite_master where type='table'") name sql 1 type_dict CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT) 2 featureSet CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type)) 3 pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 4 mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 5 table_info CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n) > So far so good. However, how now to continue? For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) : > get(affy.probesets, mirna10cdf) pm mm [1,] 34705 NA [2,] 46085 NA [3,] 20445 NA [4,] 26368 NA <<snip>> Main question: how could I achieve this when using a PdInfo object? Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent? [From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"]. -->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in: http://www.pubmed.org/23709276. Thanks, Guido --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld@wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]]

miRNA cdf affy miRNA cdf affy • 2.3k views

ADD COMMENT • link updated 11.3 years ago by Benilton Carvalho ★ 4.3k • written 11.3 years ago by Guido Hooiveld ★ 4.1k

0

Entering edit mode

Benilton Carvalho ★ 4.3k

@benilton-carvalho-1375

Last seen 5.1 years ago

Brazil/Campinas/UNICAMP

Hi Guido, you may benefit from avoiding the SQL interface... How about: library(oligo) rawData = read.celfiles(list.celfiles()) affy.probesets = c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st") fields = c('man_fsetid', 'fid', 'x', 'y') probeInfo = getProbeInfo(rawData, fields, subset=man_fsetid %in% affy.probesets) head(probeInfo) 'man_fsetid' is the manufacturer featureset id (i.e., probeset name) 'fid' is the feature id (which coincides with the row number in the *FeatureSet object for that given probe) 'x' and ''y' are the physical x/y coordinates note that the 'subset=man_fsetid %in% affy.probesets' expression above corresponds to the subset= argument in the subset() command. As '...' for getProbeInfo() you can pass any argument for the subset() command. If you prefer to use the SQL interface, the featureSet table contains the *probeset* information; the pmfeature table contains the *probe* information. In your case, you want to INNER JOIN featureSet and pmfeature tables and subset using man_fsetid... HTH, b 2014/1/21 Hooiveld, Guido <guido.hooiveld@wur.nl> > Hello, > I would like to extract the probes that belong to a set of probesets from > a PdInfo database, but despite searching the archives I got stuck... I > would appreciate some hints. > > To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would > like to extract all probes that belong to e.g. a set of affy control > probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX-BkGr17-GC11_st. > This is my approach: > > library(pd.mirna.3.1) > > con <- db(pd.mirna.3.1) > > > affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st") > > affy.probesets > [1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st" > > > > > #check available tables/information > > dbGetQuery(con, "select name, sql from sqlite_master where type='table'") > name > sql > 1 type_dict CREATE > TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT) > 2 featureSet CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, > man_fsetid TEXT, type INTEGER REFERENCES type_dict(type)) > 3 pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER > REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) > 4 mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER > REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) > 5 table_info > CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n) > > > > So far so good. > However, how now to continue? > For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I > would do something like this (although now only the probes for the 1st > probeset in affy.probesets would be extracted, but that's now not the main > question) : > > get(affy.probesets, mirna10cdf) > pm mm > [1,] 34705 NA > [2,] 46085 NA > [3,] 20445 NA > [4,] 26368 NA > <<snip>> > > Main question: how could I achieve this when using a PdInfo object? > > Related to this, how can I get more info on what the various keys > represent? E.g. what does 'man_fsetid' represent? > [From the mailing list I meanwhile now these represent the Affymetrix > "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"]. > > > -->> Reason I am asking all this is because I would like to analyze > (normalize) my miRNA 3.1 dataset using the normexp-by-control background > correction (nec function in limma), essentially as described in: > http://www.pubmed.org/23709276. > > Thanks, > Guido > > --------------------------------------------------------- > Guido Hooiveld, PhD > Nutrition, Metabolism & Genomics Group > Division of Human Nutrition > Wageningen University > Biotechnion, Bomenweg 2 > NL-6703 HD Wageningen > the Netherlands > tel: (+)31 317 485788 > fax: (+)31 317 483342 > email: guido.hooiveld@wur.nl > internet: http://nutrigene.4t.com > http://scholar.google.com/citations?user=qFHaMnoAAAAJ > http://www.researcherid.com/rid/F-4912-2010 > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.3 years ago Benilton Carvalho ★ 4.3k

0

Entering edit mode

Wu, Di ▴ 120

@wu-di-4945

Last seen 9.8 years ago

United States

Hi Guido, See if the following annotation file is what can help you. http://www.affymetrix.com/support/technical/byproduct.affx?product=mir na_array_strip (Additinal Support) miRNA 3.1 Annotations, Unsupported, CSV format Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Hooiveld, Guido [guido.hooiveld@wur.nl] Sent: Tuesday, January 21, 2014 11:50 AM To: bioconductor at r-project.org Subject: [BioC] how to extract probes for a probeset from PdInfo database? Hello, I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints. To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX- BkGr17-GC11_st. This is my approach: > library(pd.mirna.3.1) > con <- db(pd.mirna.3.1) > affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st") > affy.probesets [1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st" > > #check available tables/information > dbGetQuery(con, "select name, sql from sqlite_master where type='table'") name sql 1 type_dict CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT) 2 featureSet CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type)) 3 pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 4 mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 5 table_info CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n) > So far so good. However, how now to continue? For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) : > get(affy.probesets, mirna10cdf) pm mm [1,] 34705 NA [2,] 46085 NA [3,] 20445 NA [4,] 26368 NA <<snip>> Main question: how could I achieve this when using a PdInfo object? Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent? [From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"]. -->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in: http://www.pubmed.org/23709276. Thanks, Guido --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld at wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.3 years ago Wu, Di ▴ 120

0

Entering edit mode

Hi Di, Thanks a lot for your feedback regarding this query. I am indeed aware of the annotation file you mentioned, from which I already had extracted the IDs of the relevant *probe sets*. The problem I am (was?) facing now is how to get a list the corresponding *probes* that comprise those sets. A CDF is unfortunately not provided for this array, so I have to get used to working with the PdInfo packages. Regards, Guido -----Original Message----- From: Wu, Di [mailto:dwu@fas.harvard.edu] Sent: Tuesday, January 21, 2014 18:49 To: Hooiveld, Guido; bioconductor at r-project.org Subject: RE: how to extract probes for a probeset from PdInfo database? Hi Guido, See if the following annotation file is what can help you. http://www.affymetrix.com/support/technical/byproduct.affx?product=mir na_array_strip (Additinal Support) miRNA 3.1 Annotations, Unsupported, CSV format Di ---- Di Wu Postdoctoral fellow Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] on behalf of Hooiveld, Guido [guido.hooiveld@wur.nl] Sent: Tuesday, January 21, 2014 11:50 AM To: bioconductor at r-project.org Subject: [BioC] how to extract probes for a probeset from PdInfo database? Hello, I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints. To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX- BkGr17-GC11_st. This is my approach: > library(pd.mirna.3.1) > con <- db(pd.mirna.3.1) > affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st") > affy.probesets [1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st" > > #check available tables/information > dbGetQuery(con, "select name, sql from sqlite_master where > type='table'") name sql 1 type_dict CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT) 2 featureSet CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type)) 3 pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 4 mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER) 5 table_info CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n) > So far so good. However, how now to continue? For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) : > get(affy.probesets, mirna10cdf) pm mm [1,] 34705 NA [2,] 46085 NA [3,] 20445 NA [4,] 26368 NA <<snip>> Main question: how could I achieve this when using a PdInfo object? Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent? [From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"]. -->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in: http://www.pubmed.org/23709276. Thanks, Guido --------------------------------------------------------- Guido Hooiveld, PhD Nutrition, Metabolism & Genomics Group Division of Human Nutrition Wageningen University Biotechnion, Bomenweg 2 NL-6703 HD Wageningen the Netherlands tel: (+)31 317 485788 fax: (+)31 317 483342 email: guido.hooiveld at wur.nl internet: http://nutrigene.4t.com http://scholar.google.com/citations?user=qFHaMnoAAAAJ http://www.researcherid.com/rid/F-4912-2010 [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.3 years ago Guido Hooiveld ★ 4.1k

Login before adding your answer.