Question

genefilter findLargest and Illumina ids

0

Entering edit mode

Lavinia Gordon ▴ 480

@lavinia-gordon-2959

Last seen 10.7 years ago

Dear All I am trying to replicate the example from the Category vignette, but with Illumina data (e.g. from http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/doc/ Category.R) My data is MouseWG-6_V1 where the featureNames(mydata) are Array_Address_Ids (with the data read in as a LumiBatch/ExpressionSet object) and it is BeadStudio output. Using the annotation package illuminaMousev1BeadID.db gives: > fL = findLargest(featureNames(mydata), abs(ttests$statistic), "illuminaMousev1BeadID") Loading required package: org.Mm.eg.db Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "580022" not found How can I confirm which annotation package to use, as I have BeadStudio data but with Array_Address_Ids? I assume it would be possible to substitute featureNames for featureData, where >head(featureData(mydata)[[2]]) [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" "0610007J10RIK" "0610007L01RIK" but I haven't had any success in matching these to available keys. Any advice greatly appreciated. with regards Lavinia Gordon. -- Senior Bioinformatics Officer Murdoch Childrens Research Institute Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia www.mcri.edu.au

Annotation Category Annotation Category • 1.7k views

ADD COMMENT • link updated 15.0 years ago by Pan Du ★ 1.2k • written 15.0 years ago by Lavinia Gordon ▴ 480

score 0 · Answer 1 · 2010-04-14

0

Entering edit mode

Pan Du ★ 1.2k

@pan-du-2010

Last seen 10.7 years ago

Hi Lavinia Illumina Array_Address_Ids is not the regular IDs used in public. You need to first convert Array_Address_Ids as Illumina ID or directly convert as Entrez Gene IDs, and then do functional analysis. You can use lumiMouseIDMapping for ID mapping. Here is some code: # convert Array_Address_Ids to nuID first nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') # then convert back to regular IlluminaID IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') # or map to Entrez ID based on Illumina manifest file (this might be old) entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') Pan On 4/14/10 5:00 AM, "bioconductor-request at stat.math.ethz.ch" <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > Message: 21 > Date: Wed, 14 Apr 2010 15:25:44 +1000 > From: Lavinia Gordon <lavinia.gordon at="" mcri.edu.au=""> > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] genefilter findLargest and Illumina ids > Message-ID: <4BC551D8.60909 at mcri.edu.au> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear All > > I am trying to replicate the example from the Category vignette, but > with Illumina data > (e.g. from > http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/do c/Category > .R) > > My data is MouseWG-6_V1 > where the featureNames(mydata) are Array_Address_Ids (with the data read > in as a LumiBatch/ExpressionSet object) > and it is BeadStudio output. > > Using the annotation package illuminaMousev1BeadID.db gives: >> fL = findLargest(featureNames(mydata), abs(ttests$statistic), > "illuminaMousev1BeadID") > Loading required package: org.Mm.eg.db > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "580022" not found > > How can I confirm which annotation package to use, as I have BeadStudio > data but with Array_Address_Ids? > I assume it would be possible to substitute featureNames for > featureData, where >> head(featureData(mydata)[[2]]) > [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" > "0610007J10RIK" "0610007L01RIK" > but I haven't had any success in matching these to available keys. > > Any advice greatly appreciated. > > with regards > > Lavinia Gordon. > > -- > Senior Bioinformatics Officer > Murdoch Childrens Research Institute > Royal Children's Hospital > Flemington Road > Parkville > Victoria 3052 > Australia > www.mcri.edu.au

ADD COMMENT • link 15.0 years ago Pan Du ★ 1.2k

0

Entering edit mode

Hi Pan, Thank you for your suggestion. This is helpful as it provides me with several alternative ids: >head(IlluminaIDs) Search_Key ILMN_Gene Accession Symbol Probe_Id 580022 "scl29691.6.1_260" "0610005I04" "NM_177579.2" "0610005I04" "ILMN_1238136" 2940601 "NM_025791.1" "0610006I08RIK" "NM_025791.1" "0610006I08Rik" "ILMN_2721178" 102260551 "scl00381629.1_255" "0610007C21RIK" "NM_212470.2" "0610007C21Rik" "ILMN_1230777" 102370333 "XM_355589.1" "0610007C21RIK" "NM_212470.2" "0610007C21Rik" "ILMN_2537239" 105670398 "ri|0610007J10|R000001F05|AK018717|633" "0610007J10RIK" "AK018717" "0610007J10Rik" "ILMN_1246069" 102030278 "scl27163.9.1_177" "0610007L01RIK" "XM_355643" "0610007L01Rik" "ILMN_2524361" Array_Address_Id nuID 580022 "580022" "6n3oyiKoz7Pj0VTfu0" 2940601 "2940601" "ZhdXp75JftSF3iWLF4" 102260551 "102260551" "BRRdqrfuhH69KLodsc" 102370333 "102370333" "omRXTc0LVpTtklCCPw" 105670398 "105670398" "upNer6bJTpt27XeZe4" 102030278 "102030278" "ooevXuTfR0trfSs0RE" However I cannot use 'Search_Key' as it is non-unique: > featureNames(x.snorm.fa) <- IlluminaIDs[,1] Error in `row.names<-.data.frame`(`*tmp*`, value = c("scl29691.6.1_260", : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': 'gi_7305154_ref_NM_013556.1__205_a_7_0', 'IGHD_V00786_Ig_heavy_constant_delta_68', 'IGHM_V00818_Ig_heavy_constant_mu_941', 'IGHV14S1_X03571$M12813_Ig_heavy_variable_14S1_9', 'IGKV12-44_AJ235955_Ig_kappa_variable_12-44_18', 'NM_001009981.1', 'NM_001024851.3', 'NM_001033378.1', 'NM_001039154.1', 'NM_007381.2', 'NM_007393.1', 'NM_007398.2', 'NM_007415.2', 'NM_007421.1', 'NM_007453.2', 'NM_007455.1', 'NM_007469.2', 'NM_007472.1', 'NM_007475.2', 'NM_007494.2', 'NM_007529.1', 'NM_007537.1', 'NM_007543.2', 'NM_007552.3', 'NM_007569.1', 'NM_007609.1', 'NM_007661.2', 'NM_007669.2', 'NM_007671.2', 'NM_007696.2', 'NM_007712.1', 'NM_007714.2', 'NM_007745.2', 'NM_007749.1', 'NM_007753.1', 'NM_007754.1', 'NM_007782.1', 'NM_007789.2', 'NM_007790.2', 'NM_007811.1', 'NM_007812.1', 'NM_007850.1', 'NM_007861.2', 'NM_007862.2', 'NM_007879.1', 'NM_007895.2', 'NM_007899.1', 'NM_007907.1', 'NM_007923.1', 'NM_007941.1', 'NM_007944.1', 'NM_007948.1', 'NM_007949.2', ' [... truncated] which means that I can only use 'Probe_Id', which isn't successful as no Illumina annotation packages use Probe_Id as the key. Any help appreciated. with regards Lavinia Gordon. On 15/04/2010 7:19 AM, Pan Du wrote: > Hi Lavinia > > Illumina Array_Address_Ids is not the regular IDs used in public. You need > to first convert Array_Address_Ids as Illumina ID or directly convert as > Entrez Gene IDs, and then do functional analysis. > > You can use lumiMouseIDMapping for ID mapping. Here is some code: > # convert Array_Address_Ids to nuID first > nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') > # then convert back to regular IlluminaID > IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') > # or map to Entrez ID based on Illumina manifest file (this might be old) > entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') > > > > Pan > > On 4/14/10 5:00 AM, "bioconductor-request@stat.math.ethz.ch" > <bioconductor-request@stat.math.ethz.ch> wrote: > > >> Message: 21 >> Date: Wed, 14 Apr 2010 15:25:44 +1000 >> From: Lavinia Gordon<lavinia.gordon@mcri.edu.au> >> To: bioconductor@stat.math.ethz.ch >> Subject: [BioC] genefilter findLargest and Illumina ids >> Message-ID:<4BC551D8.60909@mcri.edu.au> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Dear All >> >> I am trying to replicate the example from the Category vignette, but >> with Illumina data >> (e.g. from >> http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/d oc/Category >> .R) >> >> My data is MouseWG-6_V1 >> where the featureNames(mydata) are Array_Address_Ids (with the data read >> in as a LumiBatch/ExpressionSet object) >> and it is BeadStudio output. >> >> Using the annotation package illuminaMousev1BeadID.db gives: >> >>> fL = findLargest(featureNames(mydata), abs(ttests$statistic), >>> >> "illuminaMousev1BeadID") >> Loading required package: org.Mm.eg.db >> Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : >> value for "580022" not found >> >> How can I confirm which annotation package to use, as I have BeadStudio >> data but with Array_Address_Ids? >> I assume it would be possible to substitute featureNames for >> featureData, where >> >>> head(featureData(mydata)[[2]]) >>> >> [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" >> "0610007J10RIK" "0610007L01RIK" >> but I haven't had any success in matching these to available keys. >> >> Any advice greatly appreciated. >> >> with regards >> >> Lavinia Gordon. >> >> -- >> Senior Bioinformatics Officer >> Murdoch Childrens Research Institute >> Royal Children's Hospital >> Flemington Road >> Parkville >> Victoria 3052 >> Australia >> www.mcri.edu.au >> > > > > -- Senior Bioinformatics Officer Murdoch Childrens Research Institute Royal Children's Hospital Flemington Road Parkville Victoria 3052 Australia www.mcri.edu.au [[alternative HTML version deleted]]

ADD REPLY • link 15.0 years ago Lavinia Gordon ▴ 480

0

Entering edit mode

If you use ³Probe_Id², then you can use package illuminaMousev1.db to convert ³Probe_Id² to Entrez_Gene_ID. If you use ³nuID², then you can use package lumiMouseAll.db to convert ³nuID² to Entrez_Gene_ID. And then do functional analysis based on Entrez_Gene_ID. Pan On 4/14/10 10:32 PM, "Lavinia Gordon" <lavinia.gordon@mcri.edu.au> wrote: > Hi Pan, > > Thank you for your suggestion. > This is helpful as it provides me with several alternative ids: > >> >head(IlluminaIDs) > Search_Key ILMN_Gene Accession > Symbol Probe_Id > 580022 "scl29691.6.1_260" "0610005I04" "NM_177579.2" > "0610005I04" "ILMN_1238136" > 2940601 "NM_025791.1" "0610006I08RIK" > "NM_025791.1" "0610006I08Rik" "ILMN_2721178" > 102260551 "scl00381629.1_255" "0610007C21RIK" > "NM_212470.2" "0610007C21Rik" "ILMN_1230777" > 102370333 "XM_355589.1" "0610007C21RIK" > "NM_212470.2" "0610007C21Rik" "ILMN_2537239" > 105670398 "ri|0610007J10|R000001F05|AK018717|633" "0610007J10RIK" "AK018717" > "0610007J10Rik" "ILMN_1246069" > 102030278 "scl27163.9.1_177" "0610007L01RIK" "XM_355643" > "0610007L01Rik" "ILMN_2524361" > Array_Address_Id nuID > 580022 "580022" "6n3oyiKoz7Pj0VTfu0" > 2940601 "2940601" "ZhdXp75JftSF3iWLF4" > 102260551 "102260551" "BRRdqrfuhH69KLodsc" > 102370333 "102370333" "omRXTc0LVpTtklCCPw" > 105670398 "105670398" "upNer6bJTpt27XeZe4" > 102030278 "102030278" "ooevXuTfR0trfSs0RE" > > However I cannot use 'Search_Key' as it is non-unique: >> > featureNames(x.snorm.fa) <- IlluminaIDs[,1] > Error in `row.names<-.data.frame`(`*tmp*`, value = c("scl29691.6.1_260", : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': > gi_7305154_ref_NM_013556.1__205_a_7_0¹, > IGHD_V00786_Ig_heavy_constant_delta_68¹, > IGHM_V00818_Ig_heavy_constant_mu_941¹, > IGHV14S1_X03571$M12813_Ig_heavy_variable_14S1_9¹, > IGKV12-44_AJ235955_Ig_kappa_variable_12-44_18¹, NM_001009981.1¹, > NM_001024851.3¹, NM_001033378.1¹, NM_001039154.1¹, NM_007381.2¹, > NM_007393.1¹, NM_007398.2¹, NM_007415.2¹, NM_007421.1¹, NM_007453.2¹, > NM_007455.1¹, NM_007469.2¹, NM_007472.1¹, NM_007475.2¹, NM_007494.2¹, > NM_007529.1¹, NM_007537.1¹, NM_007543.2¹, NM_007552.3¹, NM_007569.1¹, > NM_007609.1¹, NM_007661.2¹, NM_007669.2¹, NM_007671.2¹, NM_007696.2¹, > NM_007712.1¹, NM_007714.2¹, NM_007745.2¹, NM_007749.1¹, NM_007753.1¹, > NM_007754.1¹, NM_007782.1¹, NM_007789.2¹, NM_007790.2¹, NM_007811.1¹, > NM_007812.1¹, NM_007850.1¹, NM_007861.2¹, NM_007862.2¹, NM_007879.1¹, > NM_007895.2¹, NM_007899.1¹, NM_007907.1¹, NM_007923.1¹, NM_007941.1¹, > NM_007944.1¹, NM_007948.1¹, NM_007949.2¹, [... truncated] > > which means that I can only use 'Probe_Id', which isn't successful as no > Illumina annotation packages use Probe_Id as the key. > Any help appreciated. > > with regards > > Lavinia Gordon. > > > On 15/04/2010 7:19 AM, Pan Du wrote: >> >> Hi Lavinia >> >> Illumina Array_Address_Ids is not the regular IDs used in public. You need >> to first convert Array_Address_Ids as Illumina ID or directly convert as >> Entrez Gene IDs, and then do functional analysis. >> >> You can use lumiMouseIDMapping for ID mapping. Here is some code: >> # convert Array_Address_Ids to nuID first >> nuIDs = IlluminaID2nuID(addressIDs, lib='lumiMouseIDMapping') >> # then convert back to regular IlluminaID >> IlluminaIDs = nuID2IlluminaID(nuIDs, lib='lumiMouseIDMapping') >> # or map to Entrez ID based on Illumina manifest file (this might be old) >> entrezIDs = nuID2EntrezID(nuIDs, lib='lumiMouseIDMapping') >> >> >> >> Pan >> >> On 4/14/10 5:00 AM, "bioconductor-request@stat.math.ethz.ch" >> <mailto:bioconductor-request@stat.math.ethz.ch> >> <bioconductor-request@stat.math.ethz.ch> >> <mailto:bioconductor-request@stat.math.ethz.ch> wrote: >> >> >> >>> >>> Message: 21 >>> Date: Wed, 14 Apr 2010 15:25:44 +1000 >>> From: Lavinia Gordon <lavinia.gordon@mcri.edu.au> >>> <mailto:lavinia.gordon@mcri.edu.au> >>> To: bioconductor@stat.math.ethz.ch >>> Subject: [BioC] genefilter findLargest and Illumina ids >>> Message-ID: <4BC551D8.60909@mcri.edu.au> <mailto:4bc551d8.60909@mcri.edu.au> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Dear All >>> >>> I am trying to replicate the example from the Category vignette, but >>> with Illumina data >>> (e.g. from >>> http://bioconductor.org/packages/2.5/bioc/vignettes/Category/inst/ doc/Catego >>> ry >>> .R) >>> >>> My data is MouseWG-6_V1 >>> where the featureNames(mydata) are Array_Address_Ids (with the data read >>> in as a LumiBatch/ExpressionSet object) >>> and it is BeadStudio output. >>> >>> Using the annotation package illuminaMousev1BeadID.db gives: >>> >>> >>>> >>>> fL = findLargest(featureNames(mydata), abs(ttests$statistic), >>>> >>>> >>> >>> "illuminaMousev1BeadID") >>> Loading required package: org.Mm.eg.db >>> Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : >>> value for "580022" not found >>> >>> How can I confirm which annotation package to use, as I have BeadStudio >>> data but with Array_Address_Ids? >>> I assume it would be possible to substitute featureNames for >>> featureData, where >>> >>> >>>> >>>> head(featureData(mydata)[[2]]) >>>> >>>> >>> >>> [1] "0610005I04" "0610006I08RIK" "0610007C21RIK" "0610007C21RIK" >>> "0610007J10RIK" "0610007L01RIK" >>> but I haven't had any success in matching these to available keys. >>> >>> Any advice greatly appreciated. >>> >>> with regards >>> >>> Lavinia Gordon. -- Pan Du, PhD Research Assistant Professor Northwestern University Biomedical Informatics Center 750 N. Lake Shore Drive, 11-176 Chicago, IL 60611 Office (312) 503-2360; Fax: (312) 503-5388 dupan (at) northwestern.edu [[alternative HTML version deleted]]

ADD REPLY • link 15.0 years ago Pan Du ★ 1.2k