genefilter findLargest and Illumina ids
1
0
Entering edit mode
@lavinia-gordon-2959
Last seen 10.3 years ago
Dear All I am trying to replicate the example from the Category vignette, but with Illumina data (e.g. from http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/doc/ Category.R) My data is MouseWG-6_V1 where the featureNames(mydata) are Array_Address_Ids (with the data read in as a LumiBatch/ExpressionSet object) and it is BeadStudio output. Using the annotation package illuminaMousev1BeadID.db gives: > fL = findLargest(featureNames(mydata), abs(ttests$statistic), "illuminaMousev1BeadID") Loading required package: org.Mm.eg.db Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "580022" not found How can I confirm which annotation package to use, as I have BeadStudio data but with Array_Address_Ids? I assume it would be possible to substitute featureNames for featureData, where >head(featureData(mydata)[[2]]) [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" "0610007J10RIK" "0610007L01RIK" but I haven't had any success in matching these to available keys. Any advice greatly appreciated. with regards Lavinia Gordon. -- Senior Bioinformatics Officer Murdoch Childrens Research Institute Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia www.mcri.edu.au
Annotation Category Annotation Category • 1.6k views
ADD COMMENT
0
Entering edit mode
Pan Du ★ 1.2k
@pan-du-2010
Last seen 10.3 years ago
Hi Lavinia Illumina Array_Address_Ids is not the regular IDs used in public. You need to first convert Array_Address_Ids as Illumina ID or directly convert as Entrez Gene IDs, and then do functional analysis. You can use lumiMouseIDMapping for ID mapping. Here is some code: # convert Array_Address_Ids to nuID first nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') # then convert back to regular IlluminaID IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') # or map to Entrez ID based on Illumina manifest file (this might be old) entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') Pan On 4/14/10 5:00 AM, "bioconductor-request at stat.math.ethz.ch" <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > Message: 21 > Date: Wed, 14 Apr 2010 15:25:44 +1000 > From: Lavinia Gordon <lavinia.gordon at="" mcri.edu.au=""> > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] genefilter findLargest and Illumina ids > Message-ID: <4BC551D8.60909 at mcri.edu.au> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear All > > I am trying to replicate the example from the Category vignette, but > with Illumina data > (e.g. from > http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/do c/Category > .R) > > My data is MouseWG-6_V1 > where the featureNames(mydata) are Array_Address_Ids (with the data read > in as a LumiBatch/ExpressionSet object) > and it is BeadStudio output. > > Using the annotation package illuminaMousev1BeadID.db gives: >> fL = findLargest(featureNames(mydata), abs(ttests$statistic), > "illuminaMousev1BeadID") > Loading required package: org.Mm.eg.db > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "580022" not found > > How can I confirm which annotation package to use, as I have BeadStudio > data but with Array_Address_Ids? > I assume it would be possible to substitute featureNames for > featureData, where >> head(featureData(mydata)[[2]]) > [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" > "0610007J10RIK" "0610007L01RIK" > but I haven't had any success in matching these to available keys. > > Any advice greatly appreciated. > > with regards > > Lavinia Gordon. > > -- > Senior Bioinformatics Officer > Murdoch Childrens Research Institute > Royal Children's Hospital > Flemington Road > Parkville > Victoria 3052 > Australia > www.mcri.edu.au
ADD COMMENT
0
Entering edit mode
Hi Pan, Thank you for your suggestion. This is helpful as it provides me with several alternative ids: >head(IlluminaIDs) Search_Key ILMN_Gene Accession Symbol Probe_Id 580022 "scl29691.6.1_260" "0610005I04" "NM_177579.2" "0610005I04" "ILMN_1238136" 2940601 "NM_025791.1" "0610006I08RIK" "NM_025791.1" "0610006I08Rik" "ILMN_2721178" 102260551 "scl00381629.1_255" "0610007C21RIK" "NM_212470.2" "0610007C21Rik" "ILMN_1230777" 102370333 "XM_355589.1" "0610007C21RIK" "NM_212470.2" "0610007C21Rik" "ILMN_2537239" 105670398 "ri|0610007J10|R000001F05|AK018717|633" "0610007J10RIK" "AK018717" "0610007J10Rik" "ILMN_1246069" 102030278 "scl27163.9.1_177" "0610007L01RIK" "XM_355643" "0610007L01Rik" "ILMN_2524361" Array_Address_Id nuID 580022 "580022" "6n3oyiKoz7Pj0VTfu0" 2940601 "2940601" "ZhdXp75JftSF3iWLF4" 102260551 "102260551" "BRRdqrfuhH69KLodsc" 102370333 "102370333" "omRXTc0LVpTtklCCPw" 105670398 "105670398" "upNer6bJTpt27XeZe4" 102030278 "102030278" "ooevXuTfR0trfSs0RE" However I cannot use 'Search_Key' as it is non-unique: > featureNames(x.snorm.fa) <- IlluminaIDs[,1] Error in `row.names<-.data.frame`(`*tmp*`, value = c("scl29691.6.1_260", : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': 'gi_7305154_ref_NM_013556.1__205_a_7_0', 'IGHD_V00786_Ig_heavy_constant_delta_68', 'IGHM_V00818_Ig_heavy_constant_mu_941', 'IGHV14S1_X03571$M12813_Ig_heavy_variable_14S1_9', 'IGKV12-44_AJ235955_Ig_kappa_variable_12-44_18', 'NM_001009981.1', 'NM_001024851.3', 'NM_001033378.1', 'NM_001039154.1', 'NM_007381.2', 'NM_007393.1', 'NM_007398.2', 'NM_007415.2', 'NM_007421.1', 'NM_007453.2', 'NM_007455.1', 'NM_007469.2', 'NM_007472.1', 'NM_007475.2', 'NM_007494.2', 'NM_007529.1', 'NM_007537.1', 'NM_007543.2', 'NM_007552.3', 'NM_007569.1', 'NM_007609.1', 'NM_007661.2', 'NM_007669.2', 'NM_007671.2', 'NM_007696.2', 'NM_007712.1', 'NM_007714.2', 'NM_007745.2', 'NM_007749.1', 'NM_007753.1', 'NM_007754.1', 'NM_007782.1', 'NM_007789.2', 'NM_007790.2', 'NM_007811.1', 'NM_007812.1', 'NM_007850.1', 'NM_007861.2', 'NM_007862.2', 'NM_007879.1', 'NM_007895.2', 'NM_007899.1', 'NM_007907.1', 'NM_007923.1', 'NM_007941.1', 'NM_007944.1', 'NM_007948.1', 'NM_007949.2', ' [... truncated] which means that I can only use 'Probe_Id', which isn't successful as no Illumina annotation packages use Probe_Id as the key. Any help appreciated. with regards Lavinia Gordon. On 15/04/2010 7:19 AM, Pan Du wrote: > Hi Lavinia > > Illumina Array_Address_Ids is not the regular IDs used in public. You need > to first convert Array_Address_Ids as Illumina ID or directly convert as > Entrez Gene IDs, and then do functional analysis. > > You can use lumiMouseIDMapping for ID mapping. Here is some code: > # convert Array_Address_Ids to nuID first > nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') > # then convert back to regular IlluminaID > IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') > # or map to Entrez ID based on Illumina manifest file (this might be old) > entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') > > > > Pan > > On 4/14/10 5:00 AM, "bioconductor-request@stat.math.ethz.ch" > <bioconductor-request@stat.math.ethz.ch> wrote: > > >> Message: 21 >> Date: Wed, 14 Apr 2010 15:25:44 +1000 >> From: Lavinia Gordon<lavinia.gordon@mcri.edu.au> >> To: bioconductor@stat.math.ethz.ch >> Subject: [BioC] genefilter findLargest and Illumina ids >> Message-ID:<4BC551D8.60909@mcri.edu.au> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Dear All >> >> I am trying to replicate the example from the Category vignette, but >> with Illumina data >> (e.g. from >> http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/d oc/Category >> .R) >> >> My data is MouseWG-6_V1 >> where the featureNames(mydata) are Array_Address_Ids (with the data read >> in as a LumiBatch/ExpressionSet object) >> and it is BeadStudio output. >> >> Using the annotation package illuminaMousev1BeadID.db gives: >> >>> fL = findLargest(featureNames(mydata), abs(ttests$statistic), >>> >> "illuminaMousev1BeadID") >> Loading required package: org.Mm.eg.db >> Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : >> value for "580022" not found >> >> How can I confirm which annotation package to use, as I have BeadStudio >> data but with Array_Address_Ids? >> I assume it would be possible to substitute featureNames for >> featureData, where >> >>> head(featureData(mydata)[[2]]) >>> >> [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" >> "0610007J10RIK" "0610007L01RIK" >> but I haven't had any success in matching these to available keys. >> >> Any advice greatly appreciated. >> >> with regards >> >> Lavinia Gordon. >> >> -- >> Senior Bioinformatics Officer >> Murdoch Childrens Research Institute >> Royal Children's Hospital >> Flemington Road >> Parkville >> Victoria 3052 >> Australia >> www.mcri.edu.au >> > > > > -- Senior Bioinformatics Officer Murdoch Childrens Research Institute Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia www.mcri.edu.au [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
If you use ³Probe_Id², then you can use package illuminaMousev1.db to convert ³Probe_Id² to Entrez_Gene_ID. If you use ³nuID², then you can use package lumiMouseAll.db to convert ³nuID² to Entrez_Gene_ID. And then do functional analysis based on Entrez_Gene_ID. Pan On 4/14/10 10:32 PM, "Lavinia Gordon" <lavinia.gordon@mcri.edu.au> wrote: > Hi Pan, > > Thank you for your suggestion. > This is helpful as it provides me with several alternative ids: > >> >head(IlluminaIDs) > Search_Key ILMN_Gene Accession > Symbol Probe_Id > 580022 "scl29691.6.1_260" "0610005I04" "NM_177579.2" > "0610005I04" "ILMN_1238136" > 2940601 "NM_025791.1" "0610006I08RIK" > "NM_025791.1" "0610006I08Rik" "ILMN_2721178" > 102260551 "scl00381629.1_255" "0610007C21RIK" > "NM_212470.2" "0610007C21Rik" "ILMN_1230777" > 102370333 "XM_355589.1" "0610007C21RIK" > "NM_212470.2" "0610007C21Rik" "ILMN_2537239" > 105670398 "ri|0610007J10|R000001F05|AK018717|633" "0610007J10RIK" "AK018717" > "0610007J10Rik" "ILMN_1246069" > 102030278 "scl27163.9.1_177" "0610007L01RIK" "XM_355643" > "0610007L01Rik" "ILMN_2524361" > Array_Address_Id nuID > 580022 "580022" "6n3oyiKoz7Pj0VTfu0" > 2940601 "2940601" "ZhdXp75JftSF3iWLF4" > 102260551 "102260551" "BRRdqrfuhH69KLodsc" > 102370333 "102370333" "omRXTc0LVpTtklCCPw" > 105670398 "105670398" "upNer6bJTpt27XeZe4" > 102030278 "102030278" "ooevXuTfR0trfSs0RE" > > However I cannot use 'Search_Key' as it is non-unique: >> > featureNames(x.snorm.fa) <- IlluminaIDs[,1] > Error in `row.names<-.data.frame`(`*tmp*`, value = c("scl29691.6.1_260", : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': > Œgi_7305154_ref_NM_013556.1__205_a_7_0¹, > ŒIGHD_V00786_Ig_heavy_constant_delta_68¹, > ŒIGHM_V00818_Ig_heavy_constant_mu_941¹, > ŒIGHV14S1_X03571$M12813_Ig_heavy_variable_14S1_9¹, > ŒIGKV12-44_AJ235955_Ig_kappa_variable_12-44_18¹, ŒNM_001009981.1¹, > ŒNM_001024851.3¹, ŒNM_001033378.1¹, ŒNM_001039154.1¹, ŒNM_007381.2¹, > ŒNM_007393.1¹, ŒNM_007398.2¹, ŒNM_007415.2¹, ŒNM_007421.1¹, ŒNM_007453.2¹, > ŒNM_007455.1¹, ŒNM_007469.2¹, ŒNM_007472.1¹, ŒNM_007475.2¹, ŒNM_007494.2¹, > ŒNM_007529.1¹, ŒNM_007537.1¹, ŒNM_007543.2¹, ŒNM_007552.3¹, ŒNM_007569.1¹, > ŒNM_007609.1¹, ŒNM_007661.2¹, ŒNM_007669.2¹, ŒNM_007671.2¹, ŒNM_007696.2¹, > ŒNM_007712.1¹, ŒNM_007714.2¹, ŒNM_007745.2¹, ŒNM_007749.1¹, ŒNM_007753.1¹, > ŒNM_007754.1¹, ŒNM_007782.1¹, ŒNM_007789.2¹, ŒNM_007790.2¹, ŒNM_007811.1¹, > ŒNM_007812.1¹, ŒNM_007850.1¹, ŒNM_007861.2¹, ŒNM_007862.2¹, ŒNM_007879.1¹, > ŒNM_007895.2¹, ŒNM_007899.1¹, ŒNM_007907.1¹, ŒNM_007923.1¹, ŒNM_007941.1¹, > ŒNM_007944.1¹, ŒNM_007948.1¹, ŒNM_007949.2¹, Œ [... truncated] > > which means that I can only use 'Probe_Id', which isn't successful as no > Illumina annotation packages use Probe_Id as the key. > Any help appreciated. > > with regards > > Lavinia Gordon. > > > On 15/04/2010 7:19 AM, Pan Du wrote: >> >> Hi Lavinia >> >> Illumina Array_Address_Ids is not the regular IDs used in public. You need >> to first convert Array_Address_Ids as Illumina ID or directly convert as >> Entrez Gene IDs, and then do functional analysis. >> >> You can use lumiMouseIDMapping for ID mapping. Here is some code: >> # convert Array_Address_Ids to nuID first >> nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') >> # then convert back to regular IlluminaID >> IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') >> # or map to Entrez ID based on Illumina manifest file (this might be old) >> entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') >> >> >> >> Pan >> >> On 4/14/10 5:00 AM, "bioconductor-request@stat.math.ethz.ch" >> <mailto:bioconductor-request@stat.math.ethz.ch> >> <bioconductor-request@stat.math.ethz.ch> >> <mailto:bioconductor-request@stat.math.ethz.ch> wrote: >> >> >> >>> >>> Message: 21 >>> Date: Wed, 14 Apr 2010 15:25:44 +1000 >>> From: Lavinia Gordon <lavinia.gordon@mcri.edu.au> >>> <mailto:lavinia.gordon@mcri.edu.au> >>> To: bioconductor@stat.math.ethz.ch >>> Subject: [BioC] genefilter findLargest and Illumina ids >>> Message-ID: <4BC551D8.60909@mcri.edu.au> <mailto:4bc551d8.60909@mcri.edu.au> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Dear All >>> >>> I am trying to replicate the example from the Category vignette, but >>> with Illumina data >>> (e.g. from >>> http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/ doc/Catego >>> ry >>> .R) >>> >>> My data is MouseWG-6_V1 >>> where the featureNames(mydata) are Array_Address_Ids (with the data read >>> in as a LumiBatch/ExpressionSet object) >>> and it is BeadStudio output. >>> >>> Using the annotation package illuminaMousev1BeadID.db gives: >>> >>> >>>> >>>> fL = findLargest(featureNames(mydata), abs(ttests$statistic), >>>> >>>> >>> >>> "illuminaMousev1BeadID") >>> Loading required package: org.Mm.eg.db >>> Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : >>> value for "580022" not found >>> >>> How can I confirm which annotation package to use, as I have BeadStudio >>> data but with Array_Address_Ids? >>> I assume it would be possible to substitute featureNames for >>> featureData, where >>> >>> >>>> >>>> head(featureData(mydata)[[2]]) >>>> >>>> >>> >>> [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" >>> "0610007J10RIK" "0610007L01RIK" >>> but I haven't had any success in matching these to available keys. >>> >>> Any advice greatly appreciated. >>> >>> with regards >>> >>> Lavinia Gordon. -- Pan Du, PhD Research Assistant Professor Northwestern University Biomedical Informatics Center 750 N. Lake Shore Drive, 11-176 Chicago, IL 60611 Office (312) 503-2360; Fax: (312) 503-5388 dupan (at) northwestern.edu [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6