error in hugene10sttranscriptcluster
1
0
Entering edit mode
@javier-perez-florido-3121
Last seen 6.8 years ago
Dear list, I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on the R version, the results are different. For example, in R 2.12.2: > mget("8104901",hugene10sttranscriptclusterENTREZID) $`8104901` [1] "3575" But >mget("8019631",hugene10sttranscriptclusterENTREZID) Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "8019631" not found The sessionInfo is: sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] grid tools tcltk stats graphics grDevices utils [8] datasets methods base other attached packages: [1] annotate_1.28.1 oneChannelGUI_1.16.5 [3] girafe_1.2.0 genomeIntervals_1.6.0 [5] intervals_0.13.3 ShortRead_1.8.2 [7] lattice_0.19-17 Rsamtools_1.2.3 [9] Biostrings_2.18.4 GenomicRanges_1.2.3 [11] baySeq_1.4.0 edgeR_2.0.5 [13] IRanges_1.8.9 preprocessCore_1.12.0 [15] GOstats_2.16.0 graph_1.28.0 [17] Category_2.16.1 tkWidgets_1.28.0 [19] DynDoc_1.28.0 widgetTools_1.28.0 [21] affylmGUI_1.24.0 affyio_1.18.0 [23] affy_1.28.0 limma_3.6.9 [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 [27] RSQLite_0.9-4 DBI_0.2-5 [29] AnnotationDbi_1.12.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.2 [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 survival_2.36-5 [9] XML_3.2-0.2 xtable_1.5-6 However, in R 2.10.0 mget("8104901",hugene10sttranscriptclusterENTREZID) $`8104901` [1] "3575" (the same as before in R 2.12.2) > mget("8019631",hugene10sttranscriptclusterENTREZID) $`8019631` [1] "6066" (there is no error like in R 2.12.2) The sessionInfo is: R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.2.3 hugene10sttranscriptcluster.db_4.0.1 [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 [5] DBI_0.2-5 AnnotationDbi_1.8.2 [7] Biobase_2.6.1 loaded via a namespace (and not attached): [1] tools_2.10.0 Why this error for Affy_ID 8019631 when R2.12.2 is used? Thanks, Javier
GO GO • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Javier, The annotation of Affy chips tends to change over time, and this might be an instance of that. If you check netaffx for this probeset, the transcript it measures is described as 'multiple', and if you blat the sequence they built the probeset against, it matches all over the place. So it may be that in the past they claimed a direct match and now they don't. You could investigate this further by looking at older versions of the annotation files if you care to know more. Best, Jim James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 >>> Javier P?rez Florido 04/16/11 8:16 AM >>> Dear list, I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on the R version, the results are different. For example, in R 2.12.2: > mget("8104901",hugene10sttranscriptclusterENTREZID) $`8104901` [1] "3575" But >mget("8019631",hugene10sttranscriptclusterENTREZID) Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "8019631" not found The sessionInfo is: sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] grid tools tcltk stats graphics grDevices utils [8] datasets methods base other attached packages: [1] annotate_1.28.1 oneChannelGUI_1.16.5 [3] girafe_1.2.0 genomeIntervals_1.6.0 [5] intervals_0.13.3 ShortRead_1.8.2 [7] lattice_0.19-17 Rsamtools_1.2.3 [9] Biostrings_2.18.4 GenomicRanges_1.2.3 [11] baySeq_1.4.0 edgeR_2.0.5 [13] IRanges_1.8.9 preprocessCore_1.12.0 [15] GOstats_2.16.0 graph_1.28.0 [17] Category_2.16.1 tkWidgets_1.28.0 [19] DynDoc_1.28.0 widgetTools_1.28.0 [21] affylmGUI_1.24.0 affyio_1.18.0 [23] affy_1.28.0 limma_3.6.9 [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 [27] RSQLite_0.9-4 DBI_0.2-5 [29] AnnotationDbi_1.12.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.2 [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 survival_2.36-5 [9] XML_3.2-0.2 xtable_1.5-6 However, in R 2.10.0 mget("8104901",hugene10sttranscriptclusterENTREZID) $`8104901` [1] "3575" (the same as before in R 2.12.2) > mget("8019631",hugene10sttranscriptclusterENTREZID) $`8019631` [1] "6066" (there is no error like in R 2.12.2) The sessionInfo is: R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.2.3 hugene10sttranscriptcluster.db_4.0.1 [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 [5] DBI_0.2-5 AnnotationDbi_1.8.2 [7] Biobase_2.6.1 loaded via a namespace (and not attached): [1] tools_2.10.0 Why this error for Affy_ID 8019631 when R2.12.2 is used? Thanks, Javier _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Dear Jim, Thanks for your quick reply. I'm not sure if I understood your explanation. I did some research at the netaffx annotation as you suggested. I've summarized using oligo package at the "core" level, i.e. transcript level. I've checked through NetAffx annotation files all the transcript clusters related to the NR_002716 gene. These are: 7948894, 8019631,8019633,8019635,8019637,8019639, 8019641, 8019703,8019705,8019707,8019709 and 8019802. What does this mean? A gene is made up of several clusters? Is the gene repeated through these clusters? Once I understand this issue, I will understand how limma works on these arrays, since I don't know whether cluster=gene or clusterS = gene. I observed some differences between the annotated files HuGene-1_0-st_v1.na29.hg18.transcript (the one I used before) and HuGene-1_0-st_v1.na31.hg19.transcript (the latest one). The main differences are related to the "start" and "stop" fields on these files for each of the transcript clusters described above. For the first version (na29.hg18), there are numbers different from zero on these fields, whereas in the latest version (na31.hg19), the "start" and "stop" values are zero. However, in both files, the "gene assignment" field is NR_002716. So, I don't understand why when I use mget("8019631",hugene10sttranscriptclusterACCNUM) and error is found whereas in the NetAffx annotation file this accession number exists. Moreover, when using the annotation from oligo (which retrieves NetAffx Biological Annotation): pData(featureData(OligoEset))["8019631","geneassignment"] returns NR_002716 I'm a little bit confused about this. Thanks again, Javier On 16/04/2011 20:40, James MacDonald wrote: > Hi Javier, > > The annotation of Affy chips tends to change over time, and this might > be an instance of that. If you check netaffx for this probeset, the > transcript it measures is described as 'multiple', and if you blat the > sequence they built the probeset against, it matches all over the place. > So it may be that in the past they claimed a direct match and now they > don't. > > You could investigate this further by looking at older versions of the > annotation files if you care to know more. > > Best, > > Jim > > > > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 >>>> Javier P?rez Florido 04/16/11 8:16 AM>>> > Dear list, > I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene > ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on > the R version, the results are different. > For example, in R 2.12.2: > > mget("8104901",hugene10sttranscriptclusterENTREZID) > $`8104901` > [1] "3575" > > But > >mget("8019631",hugene10sttranscriptclusterENTREZID) > Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "8019631" not found > > The sessionInfo is: > > sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C > [5] LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] grid tools tcltk stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] annotate_1.28.1 oneChannelGUI_1.16.5 > [3] girafe_1.2.0 genomeIntervals_1.6.0 > [5] intervals_0.13.3 ShortRead_1.8.2 > [7] lattice_0.19-17 Rsamtools_1.2.3 > [9] Biostrings_2.18.4 GenomicRanges_1.2.3 > [11] baySeq_1.4.0 edgeR_2.0.5 > [13] IRanges_1.8.9 preprocessCore_1.12.0 > [15] GOstats_2.16.0 graph_1.28.0 > [17] Category_2.16.1 tkWidgets_1.28.0 > [19] DynDoc_1.28.0 widgetTools_1.28.0 > [21] affylmGUI_1.24.0 affyio_1.18.0 > [23] affy_1.28.0 limma_3.6.9 > [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 > [27] RSQLite_0.9-4 DBI_0.2-5 > [29] AnnotationDbi_1.12.0 Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 > GSEABase_1.12.2 > [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 > survival_2.36-5 > [9] XML_3.2-0.2 xtable_1.5-6 > > However, in R 2.10.0 > mget("8104901",hugene10sttranscriptclusterENTREZID) > $`8104901` > [1] "3575" (the same as before in R 2.12.2) > > > mget("8019631",hugene10sttranscriptclusterENTREZID) > $`8019631` > [1] "6066" (there is no error like in R 2.12.2) > > The sessionInfo is: > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 > [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C > [5] LC_TIME=Spanish_Spain.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.2.3 > hugene10sttranscriptcluster.db_4.0.1 > [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 > [5] DBI_0.2-5 AnnotationDbi_1.8.2 > [7] Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] tools_2.10.0 > > Why this error for Affy_ID 8019631 when R2.12.2 is used? > Thanks, > Javier > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues > >
ADD REPLY
0
Entering edit mode
Hi Javier, On 4/18/2011 7:02 AM, Javier P?rez Florido wrote: > Dear Jim, > Thanks for your quick reply. > I'm not sure if I understood your explanation. I did some research at > the netaffx annotation as you suggested. > > I've summarized using oligo package at the "core" level, i.e. transcript > level. I've checked through NetAffx annotation files all the transcript > clusters related to the NR_002716 gene. > These are: 7948894, 8019631,8019633,8019635,8019637,8019639, 8019641, > 8019703,8019705,8019707,8019709 and 8019802. What does this mean? A gene > is made up of several clusters? Is the gene repeated through these > clusters? Once I understand this issue, I will understand how limma > works on these arrays, since I don't know whether cluster=gene or > clusterS = gene. This is one of the problems inherent to the design of arrays that use really short probes (and often very few probes per probeset). Since the probes are so short, they sometimes will bind to many different transcripts, so it is hard to say which transcript is responsible for the signal for a given probeset. If you put 8019631 in netaffx, you will see that the hybridization target is 'mixed', meaning it binds multiple transcripts. The only probeset that comes up if you query netaffx for NR_002716 is 7948894, so I assume all the others you note are mixed as well. > > I observed some differences between the annotated files > HuGene-1_0-st_v1.na29.hg18.transcript (the one I used before) and > HuGene-1_0-st_v1.na31.hg19.transcript (the latest one). The main > differences are related to the "start" and "stop" fields on these files > for each of the transcript clusters described above. > For the first version (na29.hg18), there are numbers different from zero > on these fields, whereas in the latest version (na31.hg19), the "start" > and "stop" values are zero. However, in both files, the "gene > assignment" field is NR_002716. So, I don't understand why when I use > mget("8019631",hugene10sttranscriptclusterACCNUM) and error is found > whereas in the NetAffx annotation file this accession number exists. For the annotation packages we supply (we meaning the Biocore Data Team), any probesets that interrogate multiple transcripts are set to return NA by default. You can change this behavior using the toggleProbes() function. I did a quick test of that the other day, and it doesn't appear that was done with this package. But you might try yourself. You might also contact the maintainer directly (we don't make these packages), to see if he can give more complete answers. Best, Jim > > Moreover, when using the annotation from oligo (which retrieves NetAffx > Biological Annotation): > pData(featureData(OligoEset))["8019631","geneassignment"] > returns NR_002716 > > I'm a little bit confused about this. > Thanks again, > Javier > > > On 16/04/2011 20:40, James MacDonald wrote: >> Hi Javier, >> >> The annotation of Affy chips tends to change over time, and this might >> be an instance of that. If you check netaffx for this probeset, the >> transcript it measures is described as 'multiple', and if you blat the >> sequence they built the probeset against, it matches all over the place. >> So it may be that in the past they claimed a direct match and now they >> don't. >> >> You could investigate this further by looking at older versions of the >> annotation files if you care to know more. >> >> Best, >> >> Jim >> >> >> >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >>>>> Javier P?rez Florido 04/16/11 8:16 AM>>> >> Dear list, >> I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene >> ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on >> the R version, the results are different. >> For example, in R 2.12.2: >> > mget("8104901",hugene10sttranscriptclusterENTREZID) >> $`8104901` >> [1] "3575" >> >> But >> >mget("8019631",hugene10sttranscriptclusterENTREZID) >> Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : >> value for "8019631" not found >> >> The sessionInfo is: >> >> sessionInfo() >> R version 2.12.2 (2011-02-25) >> Platform: x86_64-pc-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >> [5] LC_TIME=Spanish_Spain.1252 >> >> attached base packages: >> [1] grid tools tcltk stats graphics grDevices utils >> [8] datasets methods base >> >> other attached packages: >> [1] annotate_1.28.1 oneChannelGUI_1.16.5 >> [3] girafe_1.2.0 genomeIntervals_1.6.0 >> [5] intervals_0.13.3 ShortRead_1.8.2 >> [7] lattice_0.19-17 Rsamtools_1.2.3 >> [9] Biostrings_2.18.4 GenomicRanges_1.2.3 >> [11] baySeq_1.4.0 edgeR_2.0.5 >> [13] IRanges_1.8.9 preprocessCore_1.12.0 >> [15] GOstats_2.16.0 graph_1.28.0 >> [17] Category_2.16.1 tkWidgets_1.28.0 >> [19] DynDoc_1.28.0 widgetTools_1.28.0 >> [21] affylmGUI_1.24.0 affyio_1.18.0 >> [23] affy_1.28.0 limma_3.6.9 >> [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 >> [27] RSQLite_0.9-4 DBI_0.2-5 >> [29] AnnotationDbi_1.12.0 Biobase_2.10.0 >> >> loaded via a namespace (and not attached): >> [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 >> GSEABase_1.12.2 >> [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 >> survival_2.36-5 >> [9] XML_3.2-0.2 xtable_1.5-6 >> >> However, in R 2.10.0 >> mget("8104901",hugene10sttranscriptclusterENTREZID) >> $`8104901` >> [1] "3575" (the same as before in R 2.12.2) >> >> > mget("8019631",hugene10sttranscriptclusterENTREZID) >> $`8019631` >> [1] "6066" (there is no error like in R 2.12.2) >> >> The sessionInfo is: >> R version 2.10.0 (2009-10-26) >> i386-pc-mingw32 >> >> locale: >> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >> [5] LC_TIME=Spanish_Spain.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.2.3 >> hugene10sttranscriptcluster.db_4.0.1 >> [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 >> [5] DBI_0.2-5 AnnotationDbi_1.8.2 >> [7] Biobase_2.6.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.10.0 >> >> Why this error for Affy_ID 8019631 when R2.12.2 is used? >> Thanks, >> Javier >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should >> not be used for urgent or sensitive issues >> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY
0
Entering edit mode
Hi Jim, See below On 18/04/2011 16:00, James W. MacDonald wrote: > Hi Javier, > > On 4/18/2011 7:02 AM, Javier P?rez Florido wrote: >> Dear Jim, >> Thanks for your quick reply. >> I'm not sure if I understood your explanation. I did some research at >> the netaffx annotation as you suggested. >> >> I've summarized using oligo package at the "core" level, i.e. transcript >> level. I've checked through NetAffx annotation files all the transcript >> clusters related to the NR_002716 gene. >> These are: 7948894, 8019631,8019633,8019635,8019637,8019639, 8019641, >> 8019703,8019705,8019707,8019709 and 8019802. What does this mean? A gene >> is made up of several clusters? Is the gene repeated through these >> clusters? Once I understand this issue, I will understand how limma >> works on these arrays, since I don't know whether cluster=gene or >> clusterS = gene. > > This is one of the problems inherent to the design of arrays that use > really short probes (and often very few probes per probeset). Since > the probes are so short, they sometimes will bind to many different > transcripts, so it is hard to say which transcript is responsible for > the signal for a given probeset. > > If you put 8019631 in netaffx, you will see that the hybridization > target is 'mixed', meaning it binds multiple transcripts. The only > probeset that comes up if you query netaffx for NR_002716 is 7948894, > so I assume all the others you note are mixed as well. So, if I understood well, many transcripts may hybridize to the same probeset so that, when the hybridization is unique, the gene/transcript intended to hybridize is shown by the annotation package and when many transcripts may hybridize to the probeset, a NA is given because we don't really know which gene is hybridized to the probeset, is that correct? At least, in the annotation packages developed by the Biocore data team (this does not happen in hugene10sttranscriptcluster.db or in NetAffx biological annotation given by Oligo) My only concern is related to 7948894 probeset. When it is introduced in netaffx, it says that this probeset is mixed (the transcripts detected by this probeset are NR_002716, NR_002761, AK292330 and X59360). I'm not a biologist, so, at first, I would say that, if this probeset is "detected" by limma as being differentially expressed, we can not distinguish which gene or transcript is really differentially expressed since there are four possible (unless they are related in some way). It's true that when NR_002716 is introduced in netaffx, it says that the probeset related is 7948894, but it seems that this probeset can detect also NR_002761 and others. Am I right? Thanks again, this discussion is being very useful for me, Javier > >> >> I observed some differences between the annotated files >> HuGene-1_0-st_v1.na29.hg18.transcript (the one I used before) and >> HuGene-1_0-st_v1.na31.hg19.transcript (the latest one). The main >> differences are related to the "start" and "stop" fields on these files >> for each of the transcript clusters described above. >> For the first version (na29.hg18), there are numbers different from zero >> on these fields, whereas in the latest version (na31.hg19), the "start" >> and "stop" values are zero. However, in both files, the "gene >> assignment" field is NR_002716. So, I don't understand why when I use >> mget("8019631",hugene10sttranscriptclusterACCNUM) and error is found >> whereas in the NetAffx annotation file this accession number exists. > > For the annotation packages we supply (we meaning the Biocore Data > Team), any probesets that interrogate multiple transcripts are set to > return NA by default. You can change this behavior using the > toggleProbes() function. I did a quick test of that the other day, and > it doesn't appear that was done with this package. But you might try > yourself. > > You might also contact the maintainer directly (we don't make these > packages), to see if he can give more complete answers. > > Best, > > Jim > > >> >> Moreover, when using the annotation from oligo (which retrieves NetAffx >> Biological Annotation): >> pData(featureData(OligoEset))["8019631","geneassignment"] >> returns NR_002716 >> >> I'm a little bit confused about this. >> Thanks again, >> Javier >> >> >> On 16/04/2011 20:40, James MacDonald wrote: >>> Hi Javier, >>> >>> The annotation of Affy chips tends to change over time, and this might >>> be an instance of that. If you check netaffx for this probeset, the >>> transcript it measures is described as 'multiple', and if you blat the >>> sequence they built the probeset against, it matches all over the >>> place. >>> So it may be that in the past they claimed a direct match and now they >>> don't. >>> >>> You could investigate this further by looking at older versions of the >>> annotation files if you care to know more. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>>>>> Javier P?rez Florido 04/16/11 8:16 AM>>> >>> Dear list, >>> I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene >>> ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on >>> the R version, the results are different. >>> For example, in R 2.12.2: >>> > mget("8104901",hugene10sttranscriptclusterENTREZID) >>> $`8104901` >>> [1] "3575" >>> >>> But >>> >mget("8019631",hugene10sttranscriptclusterENTREZID) >>> Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : >>> value for "8019631" not found >>> >>> The sessionInfo is: >>> >>> sessionInfo() >>> R version 2.12.2 (2011-02-25) >>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>> >>> locale: >>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>> [5] LC_TIME=Spanish_Spain.1252 >>> >>> attached base packages: >>> [1] grid tools tcltk stats graphics grDevices utils >>> [8] datasets methods base >>> >>> other attached packages: >>> [1] annotate_1.28.1 oneChannelGUI_1.16.5 >>> [3] girafe_1.2.0 genomeIntervals_1.6.0 >>> [5] intervals_0.13.3 ShortRead_1.8.2 >>> [7] lattice_0.19-17 Rsamtools_1.2.3 >>> [9] Biostrings_2.18.4 GenomicRanges_1.2.3 >>> [11] baySeq_1.4.0 edgeR_2.0.5 >>> [13] IRanges_1.8.9 preprocessCore_1.12.0 >>> [15] GOstats_2.16.0 graph_1.28.0 >>> [17] Category_2.16.1 tkWidgets_1.28.0 >>> [19] DynDoc_1.28.0 widgetTools_1.28.0 >>> [21] affylmGUI_1.24.0 affyio_1.18.0 >>> [23] affy_1.28.0 limma_3.6.9 >>> [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 >>> [27] RSQLite_0.9-4 DBI_0.2-5 >>> [29] AnnotationDbi_1.12.0 Biobase_2.10.0 >>> >>> loaded via a namespace (and not attached): >>> [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 >>> GSEABase_1.12.2 >>> [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 >>> survival_2.36-5 >>> [9] XML_3.2-0.2 xtable_1.5-6 >>> >>> However, in R 2.10.0 >>> mget("8104901",hugene10sttranscriptclusterENTREZID) >>> $`8104901` >>> [1] "3575" (the same as before in R 2.12.2) >>> >>> > mget("8019631",hugene10sttranscriptclusterENTREZID) >>> $`8019631` >>> [1] "6066" (there is no error like in R 2.12.2) >>> >>> The sessionInfo is: >>> R version 2.10.0 (2009-10-26) >>> i386-pc-mingw32 >>> >>> locale: >>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>> [5] LC_TIME=Spanish_Spain.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_3.2.3 >>> hugene10sttranscriptcluster.db_4.0.1 >>> [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 >>> [5] DBI_0.2-5 AnnotationDbi_1.8.2 >>> [7] Biobase_2.6.1 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.10.0 >>> >>> Why this error for Affy_ID 8019631 when R2.12.2 is used? >>> Thanks, >>> Javier >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should >>> not be used for urgent or sensitive issues >>> >>> >> >
ADD REPLY
0
Entering edit mode
Hi Javier, On 4/18/2011 1:27 PM, Javier P?rez Florido wrote: > Hi Jim, > See below > > On 18/04/2011 16:00, James W. MacDonald wrote: >> Hi Javier, >> >> On 4/18/2011 7:02 AM, Javier P?rez Florido wrote: >>> Dear Jim, >>> Thanks for your quick reply. >>> I'm not sure if I understood your explanation. I did some research at >>> the netaffx annotation as you suggested. >>> >>> I've summarized using oligo package at the "core" level, i.e. transcript >>> level. I've checked through NetAffx annotation files all the transcript >>> clusters related to the NR_002716 gene. >>> These are: 7948894, 8019631,8019633,8019635,8019637,8019639, 8019641, >>> 8019703,8019705,8019707,8019709 and 8019802. What does this mean? A gene >>> is made up of several clusters? Is the gene repeated through these >>> clusters? Once I understand this issue, I will understand how limma >>> works on these arrays, since I don't know whether cluster=gene or >>> clusterS = gene. >> >> This is one of the problems inherent to the design of arrays that use >> really short probes (and often very few probes per probeset). Since >> the probes are so short, they sometimes will bind to many different >> transcripts, so it is hard to say which transcript is responsible for >> the signal for a given probeset. >> >> If you put 8019631 in netaffx, you will see that the hybridization >> target is 'mixed', meaning it binds multiple transcripts. The only >> probeset that comes up if you query netaffx for NR_002716 is 7948894, >> so I assume all the others you note are mixed as well. > So, if I understood well, many transcripts may hybridize to the same > probeset so that, when the hybridization is unique, the gene/transcript > intended to hybridize is shown by the annotation package and when many > transcripts may hybridize to the probeset, a NA is given because we > don't really know which gene is hybridized to the probeset, is that > correct? At least, in the annotation packages developed by the Biocore > data team (this does not happen in hugene10sttranscriptcluster.db or in > NetAffx biological annotation given by Oligo) > > My only concern is related to 7948894 probeset. When it is introduced in > netaffx, it says that this probeset is mixed (the transcripts detected > by this probeset are NR_002716, NR_002761, AK292330 and X59360). I'm not > a biologist, so, at first, I would say that, if this probeset is > "detected" by limma as being differentially expressed, we can not > distinguish which gene or transcript is really differentially expressed > since there are four possible (unless they are related in some way). Exactly. There is no way to tell how much of the signal is due to a particular transcript. But since you aren't a biologist, note that each gene can produce many different transcripts due to splice variants. And what we are talking about here are transcripts, not genes. Also note that accession numbers point to transcripts, not genes. And the accession numbers are assigned when researchers define what they think are new transcripts (but may not actually be new). So there can be multiple accession numbers per gene that may or may not be different transcripts of that gene. For instance, both NR_002716 and X59360 point to the same gene, and NR_002761 points to a closely related gene. I don't know offhand if NR_002716 and X59360 are different transcripts (e.g., splice variants) or not. But you may need to. Best, Jim > > It's true that when NR_002716 is introduced in netaffx, it says that the > probeset related is 7948894, but it seems that this probeset can detect > also NR_002761 and others. > > Am I right? > Thanks again, this discussion is being very useful for me, > Javier > >> >>> >>> I observed some differences between the annotated files >>> HuGene-1_0-st_v1.na29.hg18.transcript (the one I used before) and >>> HuGene-1_0-st_v1.na31.hg19.transcript (the latest one). The main >>> differences are related to the "start" and "stop" fields on these files >>> for each of the transcript clusters described above. >>> For the first version (na29.hg18), there are numbers different from zero >>> on these fields, whereas in the latest version (na31.hg19), the "start" >>> and "stop" values are zero. However, in both files, the "gene >>> assignment" field is NR_002716. So, I don't understand why when I use >>> mget("8019631",hugene10sttranscriptclusterACCNUM) and error is found >>> whereas in the NetAffx annotation file this accession number exists. >> >> For the annotation packages we supply (we meaning the Biocore Data >> Team), any probesets that interrogate multiple transcripts are set to >> return NA by default. You can change this behavior using the >> toggleProbes() function. I did a quick test of that the other day, and >> it doesn't appear that was done with this package. But you might try >> yourself. >> >> You might also contact the maintainer directly (we don't make these >> packages), to see if he can give more complete answers. >> >> Best, >> >> Jim >> >> >>> >>> Moreover, when using the annotation from oligo (which retrieves NetAffx >>> Biological Annotation): >>> pData(featureData(OligoEset))["8019631","geneassignment"] >>> returns NR_002716 >>> >>> I'm a little bit confused about this. >>> Thanks again, >>> Javier >>> >>> >>> On 16/04/2011 20:40, James MacDonald wrote: >>>> Hi Javier, >>>> >>>> The annotation of Affy chips tends to change over time, and this might >>>> be an instance of that. If you check netaffx for this probeset, the >>>> transcript it measures is described as 'multiple', and if you blat the >>>> sequence they built the probeset against, it matches all over the >>>> place. >>>> So it may be that in the past they claimed a direct match and now they >>>> don't. >>>> >>>> You could investigate this further by looking at older versions of the >>>> annotation files if you care to know more. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>>>>> Javier P?rez Florido 04/16/11 8:16 AM>>> >>>> Dear list, >>>> I'm trying to get the ENTREZIDs of some Affy_IDs of GeneChip Human Gene >>>> ST 1.0 Arrays through hugene10sttranscriptcluster package. Depending on >>>> the R version, the results are different. >>>> For example, in R 2.12.2: >>>> > mget("8104901",hugene10sttranscriptclusterENTREZID) >>>> $`8104901` >>>> [1] "3575" >>>> >>>> But >>>> >mget("8019631",hugene10sttranscriptclusterENTREZID) >>>> Error en .checkKeys(value, Lkeys(x), x at ifnotfound) : >>>> value for "8019631" not found >>>> >>>> The sessionInfo is: >>>> >>>> sessionInfo() >>>> R version 2.12.2 (2011-02-25) >>>> Platform: x86_64-pc-mingw32/x64 (64-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>>> [5] LC_TIME=Spanish_Spain.1252 >>>> >>>> attached base packages: >>>> [1] grid tools tcltk stats graphics grDevices utils >>>> [8] datasets methods base >>>> >>>> other attached packages: >>>> [1] annotate_1.28.1 oneChannelGUI_1.16.5 >>>> [3] girafe_1.2.0 genomeIntervals_1.6.0 >>>> [5] intervals_0.13.3 ShortRead_1.8.2 >>>> [7] lattice_0.19-17 Rsamtools_1.2.3 >>>> [9] Biostrings_2.18.4 GenomicRanges_1.2.3 >>>> [11] baySeq_1.4.0 edgeR_2.0.5 >>>> [13] IRanges_1.8.9 preprocessCore_1.12.0 >>>> [15] GOstats_2.16.0 graph_1.28.0 >>>> [17] Category_2.16.1 tkWidgets_1.28.0 >>>> [19] DynDoc_1.28.0 widgetTools_1.28.0 >>>> [21] affylmGUI_1.24.0 affyio_1.18.0 >>>> [23] affy_1.28.0 limma_3.6.9 >>>> [25] hugene10sttranscriptcluster.db_6.0.1 org.Hs.eg.db_2.4.6 >>>> [27] RSQLite_0.9-4 DBI_0.2-5 >>>> [29] AnnotationDbi_1.12.0 Biobase_2.10.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BSgenome_1.18.3 genefilter_1.32.0 GO.db_2.4.5 >>>> GSEABase_1.12.2 >>>> [5] hwriter_1.3 RBGL_1.26.0 splines_2.12.2 >>>> survival_2.36-5 >>>> [9] XML_3.2-0.2 xtable_1.5-6 >>>> >>>> However, in R 2.10.0 >>>> mget("8104901",hugene10sttranscriptclusterENTREZID) >>>> $`8104901` >>>> [1] "3575" (the same as before in R 2.12.2) >>>> >>>> > mget("8019631",hugene10sttranscriptclusterENTREZID) >>>> $`8019631` >>>> [1] "6066" (there is no error like in R 2.12.2) >>>> >>>> The sessionInfo is: >>>> R version 2.10.0 (2009-10-26) >>>> i386-pc-mingw32 >>>> >>>> locale: >>>> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 >>>> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C >>>> [5] LC_TIME=Spanish_Spain.1252 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] limma_3.2.3 >>>> hugene10sttranscriptcluster.db_4.0.1 >>>> [3] org.Hs.eg.db_2.3.6 RSQLite_0.9-2 >>>> [5] DBI_0.2-5 AnnotationDbi_1.8.2 >>>> [7] Biobase_2.6.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.10.0 >>>> >>>> Why this error for Affy_ID 8019631 when R2.12.2 is used? >>>> Thanks, >>>> Javier >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> ********************************************************** >>>> Electronic Mail is not secure, may not be read every day, and should >>>> not be used for urgent or sensitive issues >>>> >>>> >>> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6