Illumina annotation packages discrepancy
2
0
Entering edit mode
Pan Du ★ 1.2k
@pan-du-2010
Last seen 10.2 years ago
Hi Renaud, The reason of discrepancy is due to the different mapping criteria. Both "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on Blasting result of RefSeq database. The "lumiHumanAll.db" library is nuID indexed and includes all the probes of different versions. For the mapping from probe to RefSeq, it defined both sensitivity and specificity (see the vignette "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might include less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" filtered out some dubious mappings (e.g., one probe has multiple perfect mapping.) The "lumiHumanV2" library was built based on the original annotation by Illumina company. As a result, it has much more probe mappings. However, many mappings might be outdated because of the updates of the genome annotation. Hope this will clarify the confusion. Pan On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > Date: Thu, 27 Nov 2008 16:03:36 +0200 > From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> > Subject: [BioC] Illumina annotation packages discrepancy > To: bioconductor at stat.math.ethz.ch > Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi list, > > I've got BeadSummary data from Illumina (Array content: > HUMANREF-8_V2_11223162_B.XML.xml). > I imported it in R using the function lumi.batch. > This automatically computed the nuID for each probe and set the > annotation package to lumiHumanAll.db. > This is all good. > > BUT, when I do > > lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') > > I get 2921out of 20589 probes with NA. > > If I do the same using the old annotation package lumiHumanV2: > > lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') > > I get 454 out of 20589 probes with NA. > > Finally, if I do the same using the annotation package > illuminaHumanv2.db (but based on the corresponding TargetIDs): > > lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') > > I get 2041out of 20589 probes with NA. > > Can anybody give me an explanation for that discrepancy? And what > annotation package I should use as it looks like some interesting probes > (for my experiment) don't have annotation in the new version? > > Also I could not find any reference to that HUMANREF-8_V2_11223162_B > annotation (neither on Illumina website nor in Bioconductor packages). I > only found information about HUMANREF-8_V2_11223162_A. Is the letter > suffix (A or B) really important? > > Thanks > > ------------------------------------------------------ Pan Du, PhD Research Assistant Professor Northwestern University Biomedical Informatics Center 750 N. Lake Shore Drive, 11-176 Chicago, IL 60611 Office (312) 503-2360; Fax: (312) 503-5388 dupan (at) northwestern.edu
Annotation illuminaHumanv2 probe lumi Annotation illuminaHumanv2 probe lumi • 2.1k views
ADD COMMENT
0
Entering edit mode
@renaud-gaujoux-3125
Last seen 10.2 years ago
Hi Pan, thanks for your answer. I've been (and still am) struggling a bit to get consistent and up to date annotation for my data. So, I guess it is more reliable to use the lumiHumanAll.db package? However, what about the probes that are note annotated in lumiHumanAll but look like interesting for my study (i.e. appearing in my top lists for differential expression or classification power). I've got such probes that are annotated neither packages lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. Hence no package give me consistent annotation for my top genes. However I've got an annotation file (that came with the array data, I guess output by BeadStudio) that gives me annotations for all of my probes. But as you mentioned, these might be outdated, which actually bothers me. Any suggestion about that? By the way, how come that even Illumina "proprietary" packages (illuminaHumanv2.db) don't annotate correctly their own probes? :( Thanks again for your help and clarification, and the lumi package. Renaud Pan Du wrote: > Hi Renaud, > > The reason of discrepancy is due to the different mapping criteria. Both > "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on Blasting > result of RefSeq database. The "lumiHumanAll.db" library is nuID indexed and > includes all the probes of different versions. For the mapping from probe to > RefSeq, it defined both sensitivity and specificity (see the vignette > "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might include > less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" filtered > out some dubious mappings (e.g., one probe has multiple perfect mapping.) > > The "lumiHumanV2" library was built based on the original annotation by > Illumina company. As a result, it has much more probe mappings. However, > many mappings might be outdated because of the updates of the genome > annotation. > > Hope this will clarify the confusion. > > > Pan > > > On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" > <bioconductor-request at="" stat.math.ethz.ch=""> wrote: > > >> Date: Thu, 27 Nov 2008 16:03:36 +0200 >> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >> Subject: [BioC] Illumina annotation packages discrepancy >> To: bioconductor at stat.math.ethz.ch >> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> Hi list, >> >> I've got BeadSummary data from Illumina (Array content: >> HUMANREF-8_V2_11223162_B.XML.xml). >> I imported it in R using the function lumi.batch. >> This automatically computed the nuID for each probe and set the >> annotation package to lumiHumanAll.db. >> This is all good. >> >> BUT, when I do >> >> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >> >> I get 2921out of 20589 probes with NA. >> >> If I do the same using the old annotation package lumiHumanV2: >> >> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >> >> I get 454 out of 20589 probes with NA. >> >> Finally, if I do the same using the annotation package >> illuminaHumanv2.db (but based on the corresponding TargetIDs): >> >> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >> >> I get 2041out of 20589 probes with NA. >> >> Can anybody give me an explanation for that discrepancy? And what >> annotation package I should use as it looks like some interesting probes >> (for my experiment) don't have annotation in the new version? >> >> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >> annotation (neither on Illumina website nor in Bioconductor packages). I >> only found information about HUMANREF-8_V2_11223162_A. Is the letter >> suffix (A or B) really important? >> >> Thanks >> >> >> > > > ------------------------------------------------------ > Pan Du, PhD > Research Assistant Professor > Northwestern University Biomedical Informatics Center > 750 N. Lake Shore Drive, 11-176 > Chicago, IL 60611 > Office (312) 503-2360; Fax: (312) 503-5388 > dupan (at) northwestern.edu > ------------------------------------------------------ > > > > >
ADD COMMENT
0
Entering edit mode
On Mon, Dec 1, 2008 at 4:39 AM, Renaud Gaujoux < renaud@mancala.cbio.uct.ac.za> wrote: > Hi Pan, > > thanks for your answer. I've been (and still am) struggling a bit to get > consistent and up to date annotation for my data. > > So, I guess it is more reliable to use the lumiHumanAll.db package? > > However, what about the probes that are note annotated in lumiHumanAll but > look like interesting for my study (i.e. appearing in my top lists for > differential expression or classification power). > I've got such probes that are annotated neither packages lumiHumanAll.db > nor in lumiHumanV2 but are in illuminaHumanv2. > > Hence no package give me consistent annotation for my top genes. However > I've got an annotation file (that came with the array data, I guess output > by BeadStudio) that gives me annotations for all of my probes. But as you > mentioned, these might be outdated, which actually bothers me. Any > suggestion about that? > > By the way, how come that even Illumina "proprietary" packages > (illuminaHumanv2.db) don't annotate correctly their own probes? :( > > Thanks again for your help and clarification, and the lumi package. > Hi, Renaud. When in doubt, it is a good idea to do your own blasting of the probe sequences for those probes that look interesting. Assuming that you are going to do that with only a few probes, it is pretty straightforward to do that on the NCBI website to get answers. My guess is that you might see a confusing picture for probes that have differing annotations over time, so it will be up to you to determine what to do with such probes. Sometimes, a second assay such as PCR will be necessary to help flesh out what a probe is actually measuring. Sean > > > Pan Du wrote: > >> Hi Renaud, >> >> The reason of discrepancy is due to the different mapping criteria. Both >> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on Blasting >> result of RefSeq database. The "lumiHumanAll.db" library is nuID indexed >> and >> includes all the probes of different versions. For the mapping from probe >> to >> RefSeq, it defined both sensitivity and specificity (see the vignette >> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might >> include >> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" filtered >> out some dubious mappings (e.g., one probe has multiple perfect mapping.) >> >> The "lumiHumanV2" library was built based on the original annotation by >> Illumina company. As a result, it has much more probe mappings. However, >> many mappings might be outdated because of the updates of the genome >> annotation. >> >> Hope this will clarify the confusion. >> >> >> Pan >> >> >> On 11/28/08 5:00 AM, "bioconductor-request@stat.math.ethz.ch" >> <bioconductor-request@stat.math.ethz.ch> wrote: >> >> >> >>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>> From: Renaud Gaujoux <renaud@mancala.cbio.uct.ac.za> >>> Subject: [BioC] Illumina annotation packages discrepancy >>> To: bioconductor@stat.math.ethz.ch >>> Message-ID: <492EA8B8.5000400@cbio.uct.ac.za> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Hi list, >>> >>> I've got BeadSummary data from Illumina (Array content: >>> HUMANREF-8_V2_11223162_B.XML.xml). >>> I imported it in R using the function lumi.batch. >>> This automatically computed the nuID for each probe and set the >>> annotation package to lumiHumanAll.db. >>> This is all good. >>> >>> BUT, when I do >>> >>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>> >>> I get 2921out of 20589 probes with NA. >>> >>> If I do the same using the old annotation package lumiHumanV2: >>> >>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>> >>> I get 454 out of 20589 probes with NA. >>> >>> Finally, if I do the same using the annotation package >>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>> >>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>> >>> I get 2041out of 20589 probes with NA. >>> >>> Can anybody give me an explanation for that discrepancy? And what >>> annotation package I should use as it looks like some interesting probes >>> (for my experiment) don't have annotation in the new version? >>> >>> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >>> annotation (neither on Illumina website nor in Bioconductor packages). I >>> only found information about HUMANREF-8_V2_11223162_A. Is the letter >>> suffix (A or B) really important? >>> >>> Thanks >>> >>> >>> >>> >> >> >> ------------------------------------------------------ >> Pan Du, PhD >> Research Assistant Professor >> Northwestern University Biomedical Informatics Center >> 750 N. Lake Shore Drive, 11-176 >> Chicago, IL 60611 >> Office (312) 503-2360; Fax: (312) 503-5388 >> dupan (at) northwestern.edu >> ------------------------------------------------------ >> >> >> >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
The illuminaHumanv2.db package is not a "proprietary" package. It is currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk). It is based on BLASTed sequences but there was a problem in creating the package when more than one accession was assigned to a probe which caused the annotation program to skip all those probes which is why you are finding so many without annotation. You should contact Mark to find out if that problem was corrected and a new version released. You could also try using 2.2 release which I created and has annotation for all those probes. Lynn Renaud Gaujoux wrote: > Hi Pan, > > thanks for your answer. I've been (and still am) struggling a bit to > get consistent and up to date annotation for my data. > > So, I guess it is more reliable to use the lumiHumanAll.db package? > > However, what about the probes that are note annotated in lumiHumanAll > but look like interesting for my study (i.e. appearing in my top lists > for differential expression or classification power). > I've got such probes that are annotated neither packages > lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. > > Hence no package give me consistent annotation for my top genes. > However I've got an annotation file (that came with the array data, I > guess output by BeadStudio) that gives me annotations for all of my > probes. But as you mentioned, these might be outdated, which actually > bothers me. Any suggestion about that? > > By the way, how come that even Illumina "proprietary" packages > (illuminaHumanv2.db) don't annotate correctly their own probes? :( > > Thanks again for your help and clarification, and the lumi package. > > Renaud > > > Pan Du wrote: >> Hi Renaud, >> >> The reason of discrepancy is due to the different mapping criteria. Both >> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on >> Blasting >> result of RefSeq database. The "lumiHumanAll.db" library is nuID >> indexed and >> includes all the probes of different versions. For the mapping from >> probe to >> RefSeq, it defined both sensitivity and specificity (see the vignette >> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might >> include >> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" >> filtered >> out some dubious mappings (e.g., one probe has multiple perfect >> mapping.) >> >> The "lumiHumanV2" library was built based on the original annotation by >> Illumina company. As a result, it has much more probe mappings. However, >> many mappings might be outdated because of the updates of the genome >> annotation. >> >> Hope this will clarify the confusion. >> >> >> Pan >> >> >> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >> >> >>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >>> Subject: [BioC] Illumina annotation packages discrepancy >>> To: bioconductor at stat.math.ethz.ch >>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>> >>> Hi list, >>> >>> I've got BeadSummary data from Illumina (Array content: >>> HUMANREF-8_V2_11223162_B.XML.xml). >>> I imported it in R using the function lumi.batch. >>> This automatically computed the nuID for each probe and set the >>> annotation package to lumiHumanAll.db. >>> This is all good. >>> >>> BUT, when I do >>> >>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>> >>> I get 2921out of 20589 probes with NA. >>> >>> If I do the same using the old annotation package lumiHumanV2: >>> >>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>> >>> I get 454 out of 20589 probes with NA. >>> >>> Finally, if I do the same using the annotation package >>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>> >>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>> >>> I get 2041out of 20589 probes with NA. >>> >>> Can anybody give me an explanation for that discrepancy? And what >>> annotation package I should use as it looks like some interesting >>> probes >>> (for my experiment) don't have annotation in the new version? >>> >>> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >>> annotation (neither on Illumina website nor in Bioconductor >>> packages). I >>> only found information about HUMANREF-8_V2_11223162_A. Is the letter >>> suffix (A or B) really important? >>> >>> Thanks >>> >>> >>> >> >> >> ------------------------------------------------------ >> Pan Du, PhD >> Research Assistant Professor >> Northwestern University Biomedical Informatics Center >> 750 N. Lake Shore Drive, 11-176 >> Chicago, IL 60611 >> Office (312) 503-2360; Fax: (312) 503-5388 >> dupan (at) northwestern.edu >> ------------------------------------------------------ >> >> >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Oups... I'm really sorry Mark for the confusion. I think misread the vignette. I BLASTed some of the missing probes and some of them gave quite convincing results (100% identity but with different variants), others didn't return any sequence. So I'll try with the package from 2.2. Thanks again, Renaud Lynn Amon wrote: > The illuminaHumanv2.db package is not a "proprietary" package. It is > currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk). It > is based on BLASTed sequences but there was a problem in creating the > package when more than one accession was assigned to a probe which > caused the annotation program to skip all those probes which is why > you are finding so many without annotation. You should contact Mark > to find out if that problem was corrected and a new version released. > You could also try using 2.2 release which I created and has > annotation for all those probes. > Lynn > > > Renaud Gaujoux wrote: >> Hi Pan, >> >> thanks for your answer. I've been (and still am) struggling a bit to >> get consistent and up to date annotation for my data. >> >> So, I guess it is more reliable to use the lumiHumanAll.db package? >> >> However, what about the probes that are note annotated in >> lumiHumanAll but look like interesting for my study (i.e. appearing >> in my top lists for differential expression or classification power). >> I've got such probes that are annotated neither packages >> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. >> >> Hence no package give me consistent annotation for my top genes. >> However I've got an annotation file (that came with the array data, I >> guess output by BeadStudio) that gives me annotations for all of my >> probes. But as you mentioned, these might be outdated, which actually >> bothers me. Any suggestion about that? >> >> By the way, how come that even Illumina "proprietary" packages >> (illuminaHumanv2.db) don't annotate correctly their own probes? :( >> >> Thanks again for your help and clarification, and the lumi package. >> >> Renaud >> >> >> Pan Du wrote: >>> Hi Renaud, >>> >>> The reason of discrepancy is due to the different mapping criteria. >>> Both >>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on >>> Blasting >>> result of RefSeq database. The "lumiHumanAll.db" library is nuID >>> indexed and >>> includes all the probes of different versions. For the mapping from >>> probe to >>> RefSeq, it defined both sensitivity and specificity (see the vignette >>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it might >>> include >>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" >>> filtered >>> out some dubious mappings (e.g., one probe has multiple perfect >>> mapping.) >>> >>> The "lumiHumanV2" library was built based on the original annotation by >>> Illumina company. As a result, it has much more probe mappings. >>> However, >>> many mappings might be outdated because of the updates of the genome >>> annotation. >>> >>> Hope this will clarify the confusion. >>> >>> >>> Pan >>> >>> >>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >>> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >>> >>> >>>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>>> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >>>> Subject: [BioC] Illumina annotation packages discrepancy >>>> To: bioconductor at stat.math.ethz.ch >>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>> >>>> Hi list, >>>> >>>> I've got BeadSummary data from Illumina (Array content: >>>> HUMANREF-8_V2_11223162_B.XML.xml). >>>> I imported it in R using the function lumi.batch. >>>> This automatically computed the nuID for each probe and set the >>>> annotation package to lumiHumanAll.db. >>>> This is all good. >>>> >>>> BUT, when I do >>>> >>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>>> >>>> I get 2921out of 20589 probes with NA. >>>> >>>> If I do the same using the old annotation package lumiHumanV2: >>>> >>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>>> >>>> I get 454 out of 20589 probes with NA. >>>> >>>> Finally, if I do the same using the annotation package >>>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>>> >>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>>> >>>> I get 2041out of 20589 probes with NA. >>>> >>>> Can anybody give me an explanation for that discrepancy? And what >>>> annotation package I should use as it looks like some interesting >>>> probes >>>> (for my experiment) don't have annotation in the new version? >>>> >>>> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >>>> annotation (neither on Illumina website nor in Bioconductor >>>> packages). I >>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter >>>> suffix (A or B) really important? >>>> >>>> Thanks >>>> >>>> >>>> >>> >>> >>> ------------------------------------------------------ >>> Pan Du, PhD >>> Research Assistant Professor >>> Northwestern University Biomedical Informatics Center >>> 750 N. Lake Shore Drive, 11-176 >>> Chicago, IL 60611 >>> Office (312) 503-2360; Fax: (312) 503-5388 >>> dupan (at) northwestern.edu >>> ------------------------------------------------------ >>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
You'll want to use the illuminaHumanv2ProbeID.db package. Lynn Renaud Gaujoux wrote: > Oups... I'm really sorry Mark for the confusion. I think misread the > vignette. > > I BLASTed some of the missing probes and some of them gave quite > convincing results (100% identity but with different variants), others > didn't return any sequence. So I'll try with the package from 2.2. > > Thanks again, > Renaud > > Lynn Amon wrote: >> The illuminaHumanv2.db package is not a "proprietary" package. It is >> currently maintained by Mark Dunning (Mark.Dunning at cancer.org.uk). >> It is based on BLASTed sequences but there was a problem in creating >> the package when more than one accession was assigned to a probe >> which caused the annotation program to skip all those probes which is >> why you are finding so many without annotation. You should contact >> Mark to find out if that problem was corrected and a new version >> released. You could also try using 2.2 release which I created and >> has annotation for all those probes. >> Lynn >> >> >> Renaud Gaujoux wrote: >>> Hi Pan, >>> >>> thanks for your answer. I've been (and still am) struggling a bit to >>> get consistent and up to date annotation for my data. >>> >>> So, I guess it is more reliable to use the lumiHumanAll.db package? >>> >>> However, what about the probes that are note annotated in >>> lumiHumanAll but look like interesting for my study (i.e. appearing >>> in my top lists for differential expression or classification power). >>> I've got such probes that are annotated neither packages >>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. >>> >>> Hence no package give me consistent annotation for my top genes. >>> However I've got an annotation file (that came with the array data, >>> I guess output by BeadStudio) that gives me annotations for all of >>> my probes. But as you mentioned, these might be outdated, which >>> actually bothers me. Any suggestion about that? >>> >>> By the way, how come that even Illumina "proprietary" packages >>> (illuminaHumanv2.db) don't annotate correctly their own probes? :( >>> >>> Thanks again for your help and clarification, and the lumi package. >>> >>> Renaud >>> >>> >>> Pan Du wrote: >>>> Hi Renaud, >>>> >>>> The reason of discrepancy is due to the different mapping criteria. >>>> Both >>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on >>>> Blasting >>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID >>>> indexed and >>>> includes all the probes of different versions. For the mapping from >>>> probe to >>>> RefSeq, it defined both sensitivity and specificity (see the vignette >>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it >>>> might include >>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" >>>> filtered >>>> out some dubious mappings (e.g., one probe has multiple perfect >>>> mapping.) >>>> >>>> The "lumiHumanV2" library was built based on the original >>>> annotation by >>>> Illumina company. As a result, it has much more probe mappings. >>>> However, >>>> many mappings might be outdated because of the updates of the genome >>>> annotation. >>>> >>>> Hope this will clarify the confusion. >>>> >>>> >>>> Pan >>>> >>>> >>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >>>> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >>>> >>>> >>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>>>> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >>>>> Subject: [BioC] Illumina annotation packages discrepancy >>>>> To: bioconductor at stat.math.ethz.ch >>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>>> >>>>> Hi list, >>>>> >>>>> I've got BeadSummary data from Illumina (Array content: >>>>> HUMANREF-8_V2_11223162_B.XML.xml). >>>>> I imported it in R using the function lumi.batch. >>>>> This automatically computed the nuID for each probe and set the >>>>> annotation package to lumiHumanAll.db. >>>>> This is all good. >>>>> >>>>> BUT, when I do >>>>> >>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>>>> >>>>> I get 2921out of 20589 probes with NA. >>>>> >>>>> If I do the same using the old annotation package lumiHumanV2: >>>>> >>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>>>> >>>>> I get 454 out of 20589 probes with NA. >>>>> >>>>> Finally, if I do the same using the annotation package >>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>>>> >>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>>>> >>>>> I get 2041out of 20589 probes with NA. >>>>> >>>>> Can anybody give me an explanation for that discrepancy? And what >>>>> annotation package I should use as it looks like some interesting >>>>> probes >>>>> (for my experiment) don't have annotation in the new version? >>>>> >>>>> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >>>>> annotation (neither on Illumina website nor in Bioconductor >>>>> packages). I >>>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter >>>>> suffix (A or B) really important? >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------ >>>> Pan Du, PhD >>>> Research Assistant Professor >>>> Northwestern University Biomedical Informatics Center >>>> 750 N. Lake Shore Drive, 11-176 >>>> Chicago, IL 60611 >>>> Office (312) 503-2360; Fax: (312) 503-5388 >>>> dupan (at) northwestern.edu >>>> ------------------------------------------------------ >>>> >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
I just had a quick try but just got NAs. Should the code below work with this package? entrez <- getEG(probeids, 'illuminaHumanv2ProbeID.db') which wraps: unlist(lookUp(probeids, 'illuminaHumanv2ProbeID.db', "ENTREZID")) I tried with probeids being Illumina full IDs, Illumina trimmed IDs (without ILMN_), and with nuIDs. Thanks, Renaud Lynn Amon wrote: > You'll want to use the illuminaHumanv2ProbeID.db package. > Lynn > > Renaud Gaujoux wrote: >> Oups... I'm really sorry Mark for the confusion. I think misread the >> vignette. >> >> I BLASTed some of the missing probes and some of them gave quite >> convincing results (100% identity but with different variants), >> others didn't return any sequence. So I'll try with the package from >> 2.2. >> >> Thanks again, >> Renaud >> >> Lynn Amon wrote: >>> The illuminaHumanv2.db package is not a "proprietary" package. It >>> is currently maintained by Mark Dunning >>> (Mark.Dunning at cancer.org.uk). It is based on BLASTed sequences but >>> there was a problem in creating the package when more than one >>> accession was assigned to a probe which caused the annotation >>> program to skip all those probes which is why you are finding so >>> many without annotation. You should contact Mark to find out if >>> that problem was corrected and a new version released. You could >>> also try using 2.2 release which I created and has annotation for >>> all those probes. >>> Lynn >>> >>> >>> Renaud Gaujoux wrote: >>>> Hi Pan, >>>> >>>> thanks for your answer. I've been (and still am) struggling a bit >>>> to get consistent and up to date annotation for my data. >>>> >>>> So, I guess it is more reliable to use the lumiHumanAll.db package? >>>> >>>> However, what about the probes that are note annotated in >>>> lumiHumanAll but look like interesting for my study (i.e. appearing >>>> in my top lists for differential expression or classification power). >>>> I've got such probes that are annotated neither packages >>>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. >>>> >>>> Hence no package give me consistent annotation for my top genes. >>>> However I've got an annotation file (that came with the array data, >>>> I guess output by BeadStudio) that gives me annotations for all of >>>> my probes. But as you mentioned, these might be outdated, which >>>> actually bothers me. Any suggestion about that? >>>> >>>> By the way, how come that even Illumina "proprietary" packages >>>> (illuminaHumanv2.db) don't annotate correctly their own probes? :( >>>> >>>> Thanks again for your help and clarification, and the lumi package. >>>> >>>> Renaud >>>> >>>> >>>> Pan Du wrote: >>>>> Hi Renaud, >>>>> >>>>> The reason of discrepancy is due to the different mapping >>>>> criteria. Both >>>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on >>>>> Blasting >>>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID >>>>> indexed and >>>>> includes all the probes of different versions. For the mapping >>>>> from probe to >>>>> RefSeq, it defined both sensitivity and specificity (see the vignette >>>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it >>>>> might include >>>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" >>>>> filtered >>>>> out some dubious mappings (e.g., one probe has multiple perfect >>>>> mapping.) >>>>> >>>>> The "lumiHumanV2" library was built based on the original >>>>> annotation by >>>>> Illumina company. As a result, it has much more probe mappings. >>>>> However, >>>>> many mappings might be outdated because of the updates of the genome >>>>> annotation. >>>>> >>>>> Hope this will clarify the confusion. >>>>> >>>>> >>>>> Pan >>>>> >>>>> >>>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >>>>> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >>>>> >>>>> >>>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>>>>> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >>>>>> Subject: [BioC] Illumina annotation packages discrepancy >>>>>> To: bioconductor at stat.math.ethz.ch >>>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>>>> >>>>>> Hi list, >>>>>> >>>>>> I've got BeadSummary data from Illumina (Array content: >>>>>> HUMANREF-8_V2_11223162_B.XML.xml). >>>>>> I imported it in R using the function lumi.batch. >>>>>> This automatically computed the nuID for each probe and set the >>>>>> annotation package to lumiHumanAll.db. >>>>>> This is all good. >>>>>> >>>>>> BUT, when I do >>>>>> >>>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>>>>> >>>>>> I get 2921out of 20589 probes with NA. >>>>>> >>>>>> If I do the same using the old annotation package lumiHumanV2: >>>>>> >>>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>>>>> >>>>>> I get 454 out of 20589 probes with NA. >>>>>> >>>>>> Finally, if I do the same using the annotation package >>>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>>>>> >>>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>>>>> >>>>>> I get 2041out of 20589 probes with NA. >>>>>> >>>>>> Can anybody give me an explanation for that discrepancy? And what >>>>>> annotation package I should use as it looks like some interesting >>>>>> probes >>>>>> (for my experiment) don't have annotation in the new version? >>>>>> >>>>>> Also I could not find any reference to that HUMANREF- 8_V2_11223162_B >>>>>> annotation (neither on Illumina website nor in Bioconductor >>>>>> packages). I >>>>>> only found information about HUMANREF-8_V2_11223162_A. Is the letter >>>>>> suffix (A or B) really important? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------ >>>>> Pan Du, PhD >>>>> Research Assistant Professor >>>>> Northwestern University Biomedical Informatics Center >>>>> 750 N. Lake Shore Drive, 11-176 >>>>> Chicago, IL 60611 >>>>> Office (312) 503-2360; Fax: (312) 503-5388 >>>>> dupan (at) northwestern.edu >>>>> ------------------------------------------------------ >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>
ADD REPLY
0
Entering edit mode
Lynn Amon ▴ 280
@lynn-amon-2429
Last seen 10.2 years ago
You can match the Array_Address_Id in the bead manifest file to the probe ID in the annotation package. Lynn Renaud Gaujoux wrote: > Ok. > Since I only have the summarized data I guess I cannot get back the > ids from the scanner (?) > For the moment I combine the data from the new lumiHumanAll and the > old lumiHumanV2 when the new package does not give annotation. > > Thanks, > Renaud. > > Lynn Amon wrote: >> The probe IDs are the identifiers in the .txt or .csv files written by >> the scanner not the output by BeadStudio. >> Lynn >> >> Renaud Gaujoux wrote: >> >>> I just had a quick try but just got NAs. Should the code below work >>> with this package? >>> >>> entrez <- getEG(probeids, 'illuminaHumanv2ProbeID.db') >>> >>> which wraps: >>> >>> unlist(lookUp(probeids, 'illuminaHumanv2ProbeID.db', "ENTREZID")) >>> >>> I tried with probeids being Illumina full IDs, Illumina trimmed IDs >>> (without ILMN_), and with nuIDs. >>> >>> Thanks, >>> Renaud >>> >>> Lynn Amon wrote: >>> >>>> You'll want to use the illuminaHumanv2ProbeID.db package. >>>> Lynn >>>> >>>> Renaud Gaujoux wrote: >>>> >>>>> Oups... I'm really sorry Mark for the confusion. I think misread the >>>>> vignette. >>>>> >>>>> I BLASTed some of the missing probes and some of them gave quite >>>>> convincing results (100% identity but with different variants), >>>>> others didn't return any sequence. So I'll try with the package from >>>>> 2.2. >>>>> >>>>> Thanks again, >>>>> Renaud >>>>> >>>>> Lynn Amon wrote: >>>>> >>>>>> The illuminaHumanv2.db package is not a "proprietary" package. It >>>>>> is currently maintained by Mark Dunning >>>>>> (Mark.Dunning at cancer.org.uk). It is based on BLASTed sequences but >>>>>> there was a problem in creating the package when more than one >>>>>> accession was assigned to a probe which caused the annotation >>>>>> program to skip all those probes which is why you are finding so >>>>>> many without annotation. You should contact Mark to find out if >>>>>> that problem was corrected and a new version released. You could >>>>>> also try using 2.2 release which I created and has annotation for >>>>>> all those probes. >>>>>> Lynn >>>>>> >>>>>> >>>>>> Renaud Gaujoux wrote: >>>>>> >>>>>>> Hi Pan, >>>>>>> >>>>>>> thanks for your answer. I've been (and still am) struggling a bit >>>>>>> to get consistent and up to date annotation for my data. >>>>>>> >>>>>>> So, I guess it is more reliable to use the lumiHumanAll.db package? >>>>>>> >>>>>>> However, what about the probes that are note annotated in >>>>>>> lumiHumanAll but look like interesting for my study (i.e. >>>>>>> appearing in my top lists for differential expression or >>>>>>> classification power). >>>>>>> I've got such probes that are annotated neither packages >>>>>>> lumiHumanAll.db nor in lumiHumanV2 but are in illuminaHumanv2. >>>>>>> >>>>>>> Hence no package give me consistent annotation for my top genes. >>>>>>> However I've got an annotation file (that came with the array >>>>>>> data, I guess output by BeadStudio) that gives me annotations for >>>>>>> all of my probes. But as you mentioned, these might be outdated, >>>>>>> which actually bothers me. Any suggestion about that? >>>>>>> >>>>>>> By the way, how come that even Illumina "proprietary" packages >>>>>>> (illuminaHumanv2.db) don't annotate correctly their own probes? :( >>>>>>> >>>>>>> Thanks again for your help and clarification, and the lumi package. >>>>>>> >>>>>>> Renaud >>>>>>> >>>>>>> >>>>>>> Pan Du wrote: >>>>>>> >>>>>>>> Hi Renaud, >>>>>>>> >>>>>>>> The reason of discrepancy is due to the different mapping >>>>>>>> criteria. Both >>>>>>>> "lumiHumanAll.db" and "illuminaHumanv2.db" libraries are based on >>>>>>>> Blasting >>>>>>>> result of RefSeq database. The "lumiHumanAll.db" library is nuID >>>>>>>> indexed and >>>>>>>> includes all the probes of different versions. For the mapping >>>>>>>> from probe to >>>>>>>> RefSeq, it defined both sensitivity and specificity (see the >>>>>>>> vignette >>>>>>>> "IlluminaAnnotation.Rnw" in the lumi package). As a result, it >>>>>>>> might include >>>>>>>> less mapping than "illuminaHumanv2.db" because "lumiHumanAll.db" >>>>>>>> filtered >>>>>>>> out some dubious mappings (e.g., one probe has multiple perfect >>>>>>>> mapping.) >>>>>>>> >>>>>>>> The "lumiHumanV2" library was built based on the original >>>>>>>> annotation by >>>>>>>> Illumina company. As a result, it has much more probe mappings. >>>>>>>> However, >>>>>>>> many mappings might be outdated because of the updates of the >>>>>>>> genome >>>>>>>> annotation. >>>>>>>> >>>>>>>> Hope this will clarify the confusion. >>>>>>>> >>>>>>>> >>>>>>>> Pan >>>>>>>> >>>>>>>> >>>>>>>> On 11/28/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch" >>>>>>>> <bioconductor-request at="" stat.math.ethz.ch=""> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Date: Thu, 27 Nov 2008 16:03:36 +0200 >>>>>>>>> From: Renaud Gaujoux <renaud at="" mancala.cbio.uct.ac.za=""> >>>>>>>>> Subject: [BioC] Illumina annotation packages discrepancy >>>>>>>>> To: bioconductor at stat.math.ethz.ch >>>>>>>>> Message-ID: <492EA8B8.5000400 at cbio.uct.ac.za> >>>>>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>>>>>>>> >>>>>>>>> Hi list, >>>>>>>>> >>>>>>>>> I've got BeadSummary data from Illumina (Array content: >>>>>>>>> HUMANREF-8_V2_11223162_B.XML.xml). >>>>>>>>> I imported it in R using the function lumi.batch. >>>>>>>>> This automatically computed the nuID for each probe and set the >>>>>>>>> annotation package to lumiHumanAll.db. >>>>>>>>> This is all good. >>>>>>>>> >>>>>>>>> BUT, when I do >>>>>>>>> >>>>>>>>> lookUp(nuIDs, 'lumiHumanAll.db', 'GENENAME') >>>>>>>>> >>>>>>>>> I get 2921out of 20589 probes with NA. >>>>>>>>> >>>>>>>>> If I do the same using the old annotation package lumiHumanV2: >>>>>>>>> >>>>>>>>> lookUp(nuIDs, 'lumiHumanV2', 'GENENAME') >>>>>>>>> >>>>>>>>> I get 454 out of 20589 probes with NA. >>>>>>>>> >>>>>>>>> Finally, if I do the same using the annotation package >>>>>>>>> illuminaHumanv2.db (but based on the corresponding TargetIDs): >>>>>>>>> >>>>>>>>> lookUp(targetIDs, 'illuminaHumanv2.db', 'GENENAME') >>>>>>>>> >>>>>>>>> I get 2041out of 20589 probes with NA. >>>>>>>>> >>>>>>>>> Can anybody give me an explanation for that discrepancy? And what >>>>>>>>> annotation package I should use as it looks like some >>>>>>>>> interesting probes >>>>>>>>> (for my experiment) don't have annotation in the new version? >>>>>>>>> >>>>>>>>> Also I could not find any reference to that >>>>>>>>> HUMANREF-8_V2_11223162_B >>>>>>>>> annotation (neither on Illumina website nor in Bioconductor >>>>>>>>> packages). I >>>>>>>>> only found information about HUMANREF-8_V2_11223162_A. Is the >>>>>>>>> letter >>>>>>>>> suffix (A or B) really important? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> ------------------------------------------------------ >>>>>>>> Pan Du, PhD >>>>>>>> Research Assistant Professor >>>>>>>> Northwestern University Biomedical Informatics Center >>>>>>>> 750 N. Lake Shore Drive, 11-176 >>>>>>>> Chicago, IL 60611 >>>>>>>> Office (312) 503-2360; Fax: (312) 503-5388 >>>>>>>> dupan (at) northwestern.edu >>>>>>>> ------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >> >> >
ADD COMMENT

Login before adding your answer.

Traffic: 916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6