package pair "hugene10stv1cdf"/"hugene10stprobeset.db"
1
0
Entering edit mode
Laurent Gautier ★ 2.3k
@laurent-gautier-29
Last seen 10.3 years ago
Dear List, I am noting potential issues in the package pair "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of probe set IDs are not overlapping: > library(hugene10stv1cdf) > library(hugene10stprobeset.db) > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) Mode FALSE TRUE NA's logical 28026 4295 0 > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) Mode FALSE TRUE NA's logical 252727 4295 0 Reading closely, one can observe that "hugene10stprobeset.db" refers to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a revision 1. It is unclear to me whether this is linked to the problem, but if so then there is no hugene10stv5cdf, neither annotation for v1. The obligatory sessionInfo() is: > sessionInfo() R version 2.11.0 Patched (2010-04-24 r51813) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] oligo_1.12.0 AffyCompatible_1.8.0 [3] RCurl_1.4-1 bitops_1.0-4.1 [5] XML_2.8-1 oligoClasses_1.10.0 [7] limma_3.4.0 hugene10stv1cdf_2.6.0 [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 [11] RSQLite_0.8-4 DBI_0.2-5 [13] AnnotationDbi_1.10.0 affxparser_1.20.0 [15] affy_1.26.0 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 [7] tools_2.11.0 > Best, Laurent
Annotation cdf Annotation cdf • 2.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
Hi Laurent, Laurent Gautier wrote: > Dear List, > > I am noting potential issues in the package pair > "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of > probe set IDs are not overlapping: > > > library(hugene10stv1cdf) > > library(hugene10stprobeset.db) > > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) > Mode FALSE TRUE NA's > logical 28026 4295 0 > > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) > Mode FALSE TRUE NA's > logical 252727 4295 0 > > Reading closely, one can observe that "hugene10stprobeset.db" refers to > a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a revision > 1. It is unclear to me whether this is linked to the problem, but if so > then there is no hugene10stv5cdf, neither annotation for v1. It's hard to say what the 'revision 5' refers to. There is only one HuGene chip, and it is the version 1. There _have_ been nine versions of the annotation file released by Affy (Releases 22-30), so there is no telling what 'revision 5' refers to. But certainly it doesn't refer to a HuGene-1_0-st-v5 chip, as no such thing exists. I have a personal thesis that the Exon and Gene chips contain all manner of extra sequences that Affy threw on there so they wouldn't have the same problem they had with their 3'-biased chips. Namely that the chips were out-of-date the minute they finished the first production run because the annotations are so fluid. Now they can simply take the original 32K probesets and slice-n-dice them at will to make things that match up with the genome as we know it now. But back to the point at hand. The problem with the hugene10stv1cdf is it is based on the _unsupported_ cdf file that Affy makes available. We make it available as well, for those who insist on using the makecdfenv/affy pipeline, rather than the pdInfoBuilder/oligo pipeline, which is what one should arguably be using. Given that the data being used to create the cdf package is specifically unsupported, caveat emptor. I note that the supported library files do contain an 'r4' in the file name, so assume without any backing data that this library would actually hew more closely to the annotation data they supply. Best, Jim > > The obligatory sessionInfo() is: > > > sessionInfo() > R version 2.11.0 Patched (2010-04-24 r51813) > i686-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] oligo_1.12.0 AffyCompatible_1.8.0 > [3] RCurl_1.4-1 bitops_1.0-4.1 > [5] XML_2.8-1 oligoClasses_1.10.0 > [7] limma_3.4.0 hugene10stv1cdf_2.6.0 > [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 > [11] RSQLite_0.8-4 DBI_0.2-5 > [13] AnnotationDbi_1.10.0 affxparser_1.20.0 > [15] affy_1.26.0 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 > [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 > [7] tools_2.11.0 > > > > Best, > > > Laurent > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Hi James, Thanks for the clarifications. I am happy to see that Affymetrix has picked up the concept of alternative CDF definitions and makes it easier for its users. Regarding bioconductor, wouldn't it make sense to either mark packages as "unsupported", or better take them to a different location, making their download by the unaware less likely. In the present case should the CDF be placed outside of the main repository ? In addition, wouldn't it make sense to coordinate the release the release of probe/probeset mapping structures and annotation files (I am reading below that there annotation for revision 5 while the mapping is for revision 4) ? What about making the revision number a documented _non-exported_ vector in the packages ? This way one could do for example: > hugene10stprobeset:::revision [1] "r5" (keeping the vector non-exported circumvents the issue of a scope pollution whenever different packages with a variable "revision" are in the search path). Best, Laurent On 03/05/10 17:05, James W. MacDonald wrote: > Hi Laurent, > > Laurent Gautier wrote: >> Dear List, >> >> I am noting potential issues in the package pair >> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >> probe set IDs are not overlapping: >> >> > library(hugene10stv1cdf) >> > library(hugene10stprobeset.db) >> > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >> Mode FALSE TRUE NA's >> logical 28026 4295 0 >> > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >> Mode FALSE TRUE NA's >> logical 252727 4295 0 >> >> Reading closely, one can observe that "hugene10stprobeset.db" refers >> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >> revision 1. It is unclear to me whether this is linked to the >> problem, but if so then there is no hugene10stv5cdf, neither >> annotation for v1. > > It's hard to say what the 'revision 5' refers to. There is only one > HuGene chip, and it is the version 1. There _have_ been nine versions > of the annotation file released by Affy (Releases 22-30), so there is > no telling what 'revision 5' refers to. But certainly it doesn't refer > to a HuGene-1_0-st-v5 chip, as no such thing exists. > > I have a personal thesis that the Exon and Gene chips contain all > manner of extra sequences that Affy threw on there so they wouldn't > have the same problem they had with their 3'-biased chips. Namely that > the chips were out-of-date the minute they finished the first > production run because the annotations are so fluid. Now they can > simply take the original 32K probesets and slice-n-dice them at will > to make things that match up with the genome as we know it now. > > But back to the point at hand. The problem with the hugene10stv1cdf is > it is based on the _unsupported_ cdf file that Affy makes available. > We make it available as well, for those who insist on using the > makecdfenv/affy pipeline, rather than the pdInfoBuilder/oligo > pipeline, which is what one should arguably be using. Given that the > data being used to create the cdf package is specifically unsupported, > caveat emptor. > > I note that the supported library files do contain an 'r4' in the file > name, so assume without any backing data that this library would > actually hew more closely to the annotation data they supply. > > Best, > > Jim > > >> >> The obligatory sessionInfo() is: >> >> > sessionInfo() >> R version 2.11.0 Patched (2010-04-24 r51813) >> i686-pc-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] oligo_1.12.0 AffyCompatible_1.8.0 >> [3] RCurl_1.4-1 bitops_1.0-4.1 >> [5] XML_2.8-1 oligoClasses_1.10.0 >> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >> [11] RSQLite_0.8-4 DBI_0.2-5 >> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >> [15] affy_1.26.0 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >> [7] tools_2.11.0 >> > >> >> Best, >> >> >> Laurent >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Laurent, Further complicating things, the hugene10stprobeset.db package was a contributed package. From the DESCRIPTION file you can see that it was contributed by Arthur Li. You might want to ask him for more details about this package and also about the hugene10sttranscriptcluster.db package. Because I note that for the hugene10sttranscriptcluster.db package I get the following: summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% ls(hugene10stv1cdf)) Mode FALSE TRUE NA's logical 962 32295 0 summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) Mode FALSE TRUE NA's logical 26 32295 0 And this looks like a closer match for what you are doing (considering that we don't have a properly supported cdf file in this case). Hope this helps, Marc On 05/03/2010 09:28 AM, Laurent Gautier wrote: > Hi James, > > Thanks for the clarifications. I am happy to see that Affymetrix has > picked up the concept of alternative CDF definitions and makes it > easier for its users. > > Regarding bioconductor, wouldn't it make sense to either mark packages > as "unsupported", or better take them to a different location, making > their download by the unaware less likely. In the present case should > the CDF be placed outside of the main repository ? > > In addition, wouldn't it make sense to coordinate the release the > release of probe/probeset mapping structures and annotation files (I > am reading below that there annotation for revision 5 while the > mapping is for revision 4) ? > What about making the revision number a documented _non-exported_ > vector in the packages ? > This way one could do for example: > > hugene10stprobeset:::revision > [1] "r5" > (keeping the vector non-exported circumvents the issue of a scope > pollution whenever different packages with a variable "revision" are > in the search path). > > Best, > > > Laurent > > > > On 03/05/10 17:05, James W. MacDonald wrote: >> Hi Laurent, >> >> Laurent Gautier wrote: >>> Dear List, >>> >>> I am noting potential issues in the package pair >>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>> probe set IDs are not overlapping: >>> >>> > library(hugene10stv1cdf) >>> > library(hugene10stprobeset.db) >>> > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>> Mode FALSE TRUE NA's >>> logical 28026 4295 0 >>> > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>> Mode FALSE TRUE NA's >>> logical 252727 4295 0 >>> >>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>> revision 1. It is unclear to me whether this is linked to the >>> problem, but if so then there is no hugene10stv5cdf, neither >>> annotation for v1. >> >> It's hard to say what the 'revision 5' refers to. There is only one >> HuGene chip, and it is the version 1. There _have_ been nine versions >> of the annotation file released by Affy (Releases 22-30), so there is >> no telling what 'revision 5' refers to. But certainly it doesn't >> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >> >> I have a personal thesis that the Exon and Gene chips contain all >> manner of extra sequences that Affy threw on there so they wouldn't >> have the same problem they had with their 3'-biased chips. Namely >> that the chips were out-of-date the minute they finished the first >> production run because the annotations are so fluid. Now they can >> simply take the original 32K probesets and slice-n-dice them at will >> to make things that match up with the genome as we know it now. >> >> But back to the point at hand. The problem with the hugene10stv1cdf >> is it is based on the _unsupported_ cdf file that Affy makes >> available. We make it available as well, for those who insist on >> using the makecdfenv/affy pipeline, rather than the >> pdInfoBuilder/oligo pipeline, which is what one should arguably be >> using. Given that the data being used to create the cdf package is >> specifically unsupported, caveat emptor. >> >> I note that the supported library files do contain an 'r4' in the >> file name, so assume without any backing data that this library would >> actually hew more closely to the annotation data they supply. >> >> Best, >> >> Jim >> >> >>> >>> The obligatory sessionInfo() is: >>> >>> > sessionInfo() >>> R version 2.11.0 Patched (2010-04-24 r51813) >>> i686-pc-linux-gnu >>> >>> locale: >>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] oligo_1.12.0 AffyCompatible_1.8.0 >>> [3] RCurl_1.4-1 bitops_1.0-4.1 >>> [5] XML_2.8-1 oligoClasses_1.10.0 >>> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >>> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>> [11] RSQLite_0.8-4 DBI_0.2-5 >>> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >>> [15] affy_1.26.0 Biobase_2.8.0 >>> >>> loaded via a namespace (and not attached): >>> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >>> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >>> [7] tools_2.11.0 >>> > >>> >>> Best, >>> >>> >>> Laurent >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Marc, What I am reading translates into very little confidence in anything related to hugene 1.0ST in the bioconductor "affy" pipeline, and I really think that it should be more difficult to use it without going through steps that require one to explicitly see that this is untested/not recommended/unsafe. The CDF seems to be of uncertain quality to all, yet provided by bioconductor, and a warning message / recommendation to switch to oligo when attaching the package would be helpful, I think. Best, Laurent On 5/3/10 7:07 PM, Marc Carlson wrote: > Hi Laurent, > > Further complicating things, the hugene10stprobeset.db package was a > contributed package. From the DESCRIPTION file you can see that it was > contributed by Arthur Li. You might want to ask him for more details > about this package and also about the hugene10sttranscriptcluster.db > package. Because I note that for the hugene10sttranscriptcluster.db > package I get the following: > > > summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% ls(hugene10stv1cdf)) > > Mode FALSE TRUE NA's > logical 962 32295 0 > > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) > > Mode FALSE TRUE NA's > logical 26 32295 0 > > > And this looks like a closer match for what you are doing (considering > that we don't have a properly supported cdf file in this case). > > Hope this helps, > > > Marc > > > > On 05/03/2010 09:28 AM, Laurent Gautier wrote: > >> Hi James, >> >> Thanks for the clarifications. I am happy to see that Affymetrix has >> picked up the concept of alternative CDF definitions and makes it >> easier for its users. >> >> Regarding bioconductor, wouldn't it make sense to either mark packages >> as "unsupported", or better take them to a different location, making >> their download by the unaware less likely. In the present case should >> the CDF be placed outside of the main repository ? >> >> In addition, wouldn't it make sense to coordinate the release the >> release of probe/probeset mapping structures and annotation files (I >> am reading below that there annotation for revision 5 while the >> mapping is for revision 4) ? >> What about making the revision number a documented _non-exported_ >> vector in the packages ? >> This way one could do for example: >> >>> hugene10stprobeset:::revision >>> >> [1] "r5" >> (keeping the vector non-exported circumvents the issue of a scope >> pollution whenever different packages with a variable "revision" are >> in the search path). >> >> Best, >> >> >> Laurent >> >> >> >> On 03/05/10 17:05, James W. MacDonald wrote: >> >>> Hi Laurent, >>> >>> Laurent Gautier wrote: >>> >>>> Dear List, >>>> >>>> I am noting potential issues in the package pair >>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>>> probe set IDs are not overlapping: >>>> >>>> >>>>> library(hugene10stv1cdf) >>>>> library(hugene10stprobeset.db) >>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>>>> >>>> Mode FALSE TRUE NA's >>>> logical 28026 4295 0 >>>> >>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>>>> >>>> Mode FALSE TRUE NA's >>>> logical 252727 4295 0 >>>> >>>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>>> revision 1. It is unclear to me whether this is linked to the >>>> problem, but if so then there is no hugene10stv5cdf, neither >>>> annotation for v1. >>>> >>> It's hard to say what the 'revision 5' refers to. There is only one >>> HuGene chip, and it is the version 1. There _have_ been nine versions >>> of the annotation file released by Affy (Releases 22-30), so there is >>> no telling what 'revision 5' refers to. But certainly it doesn't >>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >>> >>> I have a personal thesis that the Exon and Gene chips contain all >>> manner of extra sequences that Affy threw on there so they wouldn't >>> have the same problem they had with their 3'-biased chips. Namely >>> that the chips were out-of-date the minute they finished the first >>> production run because the annotations are so fluid. Now they can >>> simply take the original 32K probesets and slice-n-dice them at will >>> to make things that match up with the genome as we know it now. >>> >>> But back to the point at hand. The problem with the hugene10stv1cdf >>> is it is based on the _unsupported_ cdf file that Affy makes >>> available. We make it available as well, for those who insist on >>> using the makecdfenv/affy pipeline, rather than the >>> pdInfoBuilder/oligo pipeline, which is what one should arguably be >>> using. Given that the data being used to create the cdf package is >>> specifically unsupported, caveat emptor. >>> >>> I note that the supported library files do contain an 'r4' in the >>> file name, so assume without any backing data that this library would >>> actually hew more closely to the annotation data they supply. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> The obligatory sessionInfo() is: >>>> >>>> >>>>> sessionInfo() >>>>> >>>> R version 2.11.0 Patched (2010-04-24 r51813) >>>> i686-pc-linux-gnu >>>> >>>> locale: >>>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >>>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >>>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] oligo_1.12.0 AffyCompatible_1.8.0 >>>> [3] RCurl_1.4-1 bitops_1.0-4.1 >>>> [5] XML_2.8-1 oligoClasses_1.10.0 >>>> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >>>> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>>> [11] RSQLite_0.8-4 DBI_0.2-5 >>>> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >>>> [15] affy_1.26.0 Biobase_2.8.0 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >>>> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >>>> [7] tools_2.11.0 >>>> >>>>> >>>> Best, >>>> >>>> >>>> Laurent >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Laurent, The really confusing thing about the HuGene chip from Affymetrix is that they changed the way they were describing their features mid-stream. So now people who work with this have to be mindful of how the probes have been grouped ("probesets" or "transcript clusters"?). Arthur Li has been kind enough to furnish the project with both kinds of package as an option which is why I noticed what I did earlier about the transcript cluster version of the package. But the fact that Affymetrix have abandoned support for their cdf file is also creates a unique problem for us. I agree with Jim that people should arguably be using oligo rather than affy for analyzing this kind of chip. But I also agree with you that a friendly warning would be a great idea for this one particular package. Marc On 05/03/2010 03:00 PM, Laurent Gautier wrote: > Hi Marc, > > What I am reading translates into very little confidence in anything > related to hugene 1.0ST in the bioconductor "affy" pipeline, and I > really think that it should be more difficult to use it without going > through steps that require one to explicitly see that this is > untested/not recommended/unsafe. The CDF seems to be of uncertain > quality to all, yet provided by bioconductor, and a warning message / > recommendation to switch to oligo when attaching the package would be > helpful, I think. > > Best, > > > Laurent > > > > On 5/3/10 7:07 PM, Marc Carlson wrote: >> Hi Laurent, >> >> Further complicating things, the hugene10stprobeset.db package was a >> contributed package. From the DESCRIPTION file you can see that it was >> contributed by Arthur Li. You might want to ask him for more details >> about this package and also about the hugene10sttranscriptcluster.db >> package. Because I note that for the hugene10sttranscriptcluster.db >> package I get the following: >> >> >> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% >> ls(hugene10stv1cdf)) >> >> Mode FALSE TRUE NA's >> logical 962 32295 0 >> >> summary(ls(hugene10stv1cdf) %in% >> Lkeys(hugene10sttranscriptclusterSYMBOL)) >> >> Mode FALSE TRUE NA's >> logical 26 32295 0 >> >> >> And this looks like a closer match for what you are doing (considering >> that we don't have a properly supported cdf file in this case). >> >> Hope this helps, >> >> >> Marc >> >> >> >> On 05/03/2010 09:28 AM, Laurent Gautier wrote: >> >>> Hi James, >>> >>> Thanks for the clarifications. I am happy to see that Affymetrix has >>> picked up the concept of alternative CDF definitions and makes it >>> easier for its users. >>> >>> Regarding bioconductor, wouldn't it make sense to either mark packages >>> as "unsupported", or better take them to a different location, making >>> their download by the unaware less likely. In the present case should >>> the CDF be placed outside of the main repository ? >>> >>> In addition, wouldn't it make sense to coordinate the release the >>> release of probe/probeset mapping structures and annotation files (I >>> am reading below that there annotation for revision 5 while the >>> mapping is for revision 4) ? >>> What about making the revision number a documented _non-exported_ >>> vector in the packages ? >>> This way one could do for example: >>> >>>> hugene10stprobeset:::revision >>>> >>> [1] "r5" >>> (keeping the vector non-exported circumvents the issue of a scope >>> pollution whenever different packages with a variable "revision" are >>> in the search path). >>> >>> Best, >>> >>> >>> Laurent >>> >>> >>> >>> On 03/05/10 17:05, James W. MacDonald wrote: >>> >>>> Hi Laurent, >>>> >>>> Laurent Gautier wrote: >>>> >>>>> Dear List, >>>>> >>>>> I am noting potential issues in the package pair >>>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>>>> probe set IDs are not overlapping: >>>>> >>>>> >>>>>> library(hugene10stv1cdf) >>>>>> library(hugene10stprobeset.db) >>>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>>>>> >>>>> Mode FALSE TRUE NA's >>>>> logical 28026 4295 0 >>>>> >>>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>>>>> >>>>> Mode FALSE TRUE NA's >>>>> logical 252727 4295 0 >>>>> >>>>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>>>> revision 1. It is unclear to me whether this is linked to the >>>>> problem, but if so then there is no hugene10stv5cdf, neither >>>>> annotation for v1. >>>>> >>>> It's hard to say what the 'revision 5' refers to. There is only one >>>> HuGene chip, and it is the version 1. There _have_ been nine versions >>>> of the annotation file released by Affy (Releases 22-30), so there is >>>> no telling what 'revision 5' refers to. But certainly it doesn't >>>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >>>> >>>> I have a personal thesis that the Exon and Gene chips contain all >>>> manner of extra sequences that Affy threw on there so they wouldn't >>>> have the same problem they had with their 3'-biased chips. Namely >>>> that the chips were out-of-date the minute they finished the first >>>> production run because the annotations are so fluid. Now they can >>>> simply take the original 32K probesets and slice-n-dice them at will >>>> to make things that match up with the genome as we know it now. >>>> >>>> But back to the point at hand. The problem with the hugene10stv1cdf >>>> is it is based on the _unsupported_ cdf file that Affy makes >>>> available. We make it available as well, for those who insist on >>>> using the makecdfenv/affy pipeline, rather than the >>>> pdInfoBuilder/oligo pipeline, which is what one should arguably be >>>> using. Given that the data being used to create the cdf package is >>>> specifically unsupported, caveat emptor. >>>> >>>> I note that the supported library files do contain an 'r4' in the >>>> file name, so assume without any backing data that this library would >>>> actually hew more closely to the annotation data they supply. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>>> The obligatory sessionInfo() is: >>>>> >>>>> >>>>>> sessionInfo() >>>>>> >>>>> R version 2.11.0 Patched (2010-04-24 r51813) >>>>> i686-pc-linux-gnu >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >>>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >>>>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] oligo_1.12.0 AffyCompatible_1.8.0 >>>>> [3] RCurl_1.4-1 bitops_1.0-4.1 >>>>> [5] XML_2.8-1 oligoClasses_1.10.0 >>>>> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >>>>> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>>>> [11] RSQLite_0.8-4 DBI_0.2-5 >>>>> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >>>>> [15] affy_1.26.0 Biobase_2.8.0 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >>>>> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >>>>> [7] tools_2.11.0 >>>>> >>>>>> >>>>> Best, >>>>> >>>>> >>>>> Laurent >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >
ADD REPLY
0
Entering edit mode
Hi Marc, Affymetrix possibly changing the way features are described might not be the only source of confusion. Using "oligo" does not appear to make things must better, as the information that can be obtained after running "rma()" is: > eset at annotation [1] "pd.hugene.1.0.st.v1" Is this the "probeset" version ? Is this the "transcript cluster" version ? Obviously this is of utmost importance (as the probe-level summarization step will use one given grouping). Going for a fishing expedition seems a bit awkward: > summary(featureNames(eset) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) Mode FALSE TRUE NA's logical 40 33257 0 As well, with the annotation (finally) becoming very fluid, does shipping any probe grouping without an associated annotation make any sense ? Laurent On 04/05/10 03:04, Marc Carlson wrote: > Hi Laurent, > > The really confusing thing about the HuGene chip from Affymetrix is that > they changed the way they were describing their features mid-stream. So > now people who work with this have to be mindful of how the probes have > been grouped ("probesets" or "transcript clusters"?). Arthur Li has > been kind enough to furnish the project with both kinds of package as an > option which is why I noticed what I did earlier about the transcript > cluster version of the package. But the fact that Affymetrix have > abandoned support for their cdf file is also creates a unique problem > for us. I agree with Jim that people should arguably be using oligo > rather than affy for analyzing this kind of chip. But I also agree with > you that a friendly warning would be a great idea for this one > particular package. > > > Marc > > > > > On 05/03/2010 03:00 PM, Laurent Gautier wrote: > >> Hi Marc, >> >> What I am reading translates into very little confidence in anything >> related to hugene 1.0ST in the bioconductor "affy" pipeline, and I >> really think that it should be more difficult to use it without going >> through steps that require one to explicitly see that this is >> untested/not recommended/unsafe. The CDF seems to be of uncertain >> quality to all, yet provided by bioconductor, and a warning message / >> recommendation to switch to oligo when attaching the package would be >> helpful, I think. >> >> Best, >> >> >> Laurent >> >> >> >> On 5/3/10 7:07 PM, Marc Carlson wrote: >> >>> Hi Laurent, >>> >>> Further complicating things, the hugene10stprobeset.db package was a >>> contributed package. From the DESCRIPTION file you can see that it was >>> contributed by Arthur Li. You might want to ask him for more details >>> about this package and also about the hugene10sttranscriptcluster.db >>> package. Because I note that for the hugene10sttranscriptcluster.db >>> package I get the following: >>> >>> >>> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% >>> ls(hugene10stv1cdf)) >>> >>> Mode FALSE TRUE NA's >>> logical 962 32295 0 >>> >>> summary(ls(hugene10stv1cdf) %in% >>> Lkeys(hugene10sttranscriptclusterSYMBOL)) >>> >>> Mode FALSE TRUE NA's >>> logical 26 32295 0 >>> >>> >>> And this looks like a closer match for what you are doing (considering >>> that we don't have a properly supported cdf file in this case). >>> >>> Hope this helps, >>> >>> >>> Marc >>> >>> >>> >>> On 05/03/2010 09:28 AM, Laurent Gautier wrote: >>> >>> >>>> Hi James, >>>> >>>> Thanks for the clarifications. I am happy to see that Affymetrix has >>>> picked up the concept of alternative CDF definitions and makes it >>>> easier for its users. >>>> >>>> Regarding bioconductor, wouldn't it make sense to either mark packages >>>> as "unsupported", or better take them to a different location, making >>>> their download by the unaware less likely. In the present case should >>>> the CDF be placed outside of the main repository ? >>>> >>>> In addition, wouldn't it make sense to coordinate the release the >>>> release of probe/probeset mapping structures and annotation files (I >>>> am reading below that there annotation for revision 5 while the >>>> mapping is for revision 4) ? >>>> What about making the revision number a documented _non-exported_ >>>> vector in the packages ? >>>> This way one could do for example: >>>> >>>> >>>>> hugene10stprobeset:::revision >>>>> >>>>> >>>> [1] "r5" >>>> (keeping the vector non-exported circumvents the issue of a scope >>>> pollution whenever different packages with a variable "revision" are >>>> in the search path). >>>> >>>> Best, >>>> >>>> >>>> Laurent >>>> >>>> >>>> >>>> On 03/05/10 17:05, James W. MacDonald wrote: >>>> >>>> >>>>> Hi Laurent, >>>>> >>>>> Laurent Gautier wrote: >>>>> >>>>> >>>>>> Dear List, >>>>>> >>>>>> I am noting potential issues in the package pair >>>>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>>>>> probe set IDs are not overlapping: >>>>>> >>>>>> >>>>>> >>>>>>> library(hugene10stv1cdf) >>>>>>> library(hugene10stprobeset.db) >>>>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>>>>>> >>>>>>> >>>>>> Mode FALSE TRUE NA's >>>>>> logical 28026 4295 0 >>>>>> >>>>>> >>>>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>>>>>> >>>>>>> >>>>>> Mode FALSE TRUE NA's >>>>>> logical 252727 4295 0 >>>>>> >>>>>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>>>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>>>>> revision 1. It is unclear to me whether this is linked to the >>>>>> problem, but if so then there is no hugene10stv5cdf, neither >>>>>> annotation for v1. >>>>>> >>>>>> >>>>> It's hard to say what the 'revision 5' refers to. There is only one >>>>> HuGene chip, and it is the version 1. There _have_ been nine versions >>>>> of the annotation file released by Affy (Releases 22-30), so there is >>>>> no telling what 'revision 5' refers to. But certainly it doesn't >>>>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >>>>> >>>>> I have a personal thesis that the Exon and Gene chips contain all >>>>> manner of extra sequences that Affy threw on there so they wouldn't >>>>> have the same problem they had with their 3'-biased chips. Namely >>>>> that the chips were out-of-date the minute they finished the first >>>>> production run because the annotations are so fluid. Now they can >>>>> simply take the original 32K probesets and slice-n-dice them at will >>>>> to make things that match up with the genome as we know it now. >>>>> >>>>> But back to the point at hand. The problem with the hugene10stv1cdf >>>>> is it is based on the _unsupported_ cdf file that Affy makes >>>>> available. We make it available as well, for those who insist on >>>>> using the makecdfenv/affy pipeline, rather than the >>>>> pdInfoBuilder/oligo pipeline, which is what one should arguably be >>>>> using. Given that the data being used to create the cdf package is >>>>> specifically unsupported, caveat emptor. >>>>> >>>>> I note that the supported library files do contain an 'r4' in the >>>>> file name, so assume without any backing data that this library would >>>>> actually hew more closely to the annotation data they supply. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>>> The obligatory sessionInfo() is: >>>>>> >>>>>> >>>>>> >>>>>>> sessionInfo() >>>>>>> >>>>>>> >>>>>> R version 2.11.0 Patched (2010-04-24 r51813) >>>>>> i686-pc-linux-gnu >>>>>> >>>>>> locale: >>>>>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >>>>>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >>>>>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] oligo_1.12.0 AffyCompatible_1.8.0 >>>>>> [3] RCurl_1.4-1 bitops_1.0-4.1 >>>>>> [5] XML_2.8-1 oligoClasses_1.10.0 >>>>>> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >>>>>> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>>>>> [11] RSQLite_0.8-4 DBI_0.2-5 >>>>>> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >>>>>> [15] affy_1.26.0 Biobase_2.8.0 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >>>>>> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >>>>>> [7] tools_2.11.0 >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>> Best, >>>>>> >>>>>> >>>>>> Laurent >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> >> > >
ADD REPLY
0
Entering edit mode
Hi Laurent, The help file for rma() in oligo describes that the default value for target is "core". Therefore, "transcript cluster" version. If you call rma() using target="probeset", you'll get the probeset version. Best, b On Mon, May 10, 2010 at 7:37 AM, Laurent Gautier <laurent at="" cbs.dtu.dk=""> wrote: > Hi Marc, > > Affymetrix possibly changing the way features are described might not be the > only source of confusion. > > Using "oligo" does not appear to make things must better, as the information > that can be obtained after running "rma()" is: > >> eset at annotation > [1] "pd.hugene.1.0.st.v1" > > Is this the "probeset" version ? Is this the "transcript cluster" version ? > Obviously this is of utmost importance (as the probe-level summarization > step will use one given grouping). > > Going for a fishing expedition seems a bit awkward: > >> summary(featureNames(eset) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) > ? Mode ? FALSE ? ?TRUE ? ?NA's > logical ? ? ?40 ? 33257 ? ? ? 0 > > As well, with the annotation (finally) becoming very fluid, does shipping > any probe grouping without an associated annotation make any sense ? > > > > Laurent > > > On 04/05/10 03:04, Marc Carlson wrote: >> >> Hi Laurent, >> >> The really confusing thing about the HuGene chip from Affymetrix is that >> they changed the way they were describing their features mid- stream. ?So >> now people who work with this have to be mindful of how the probes have >> been grouped ("probesets" or "transcript clusters"?). ?Arthur Li has >> been kind enough to furnish the project with both kinds of package as an >> option which is why I noticed what I did earlier about the transcript >> cluster version of the package. ?But the fact that Affymetrix have >> abandoned support for their cdf file is also creates a unique problem >> for us. ?I agree with Jim that people should arguably be using oligo >> rather than affy for analyzing this kind of chip. ?But I also agree with >> you that a friendly warning would be a great idea for this one >> particular package. >> >> >> ? Marc >> >> >> >> >> On 05/03/2010 03:00 PM, Laurent Gautier wrote: >> >>> >>> Hi Marc, >>> >>> What I am reading translates into very little confidence in anything >>> related to hugene 1.0ST in the bioconductor "affy" pipeline, and I >>> really think that it should be more difficult to use it without going >>> through steps that require one to explicitly see that this is >>> untested/not recommended/unsafe. The CDF seems to be of uncertain >>> quality to all, yet provided by bioconductor, and a warning message / >>> recommendation to switch to oligo when attaching the package would be >>> helpful, I think. >>> >>> Best, >>> >>> >>> Laurent >>> >>> >>> >>> On 5/3/10 7:07 PM, Marc Carlson wrote: >>> >>>> >>>> Hi Laurent, >>>> >>>> Further complicating things, the hugene10stprobeset.db package was a >>>> contributed package. ?From the DESCRIPTION file you can see that it was >>>> contributed by Arthur Li. ?You might want to ask him for more details >>>> about this package and also about the hugene10sttranscriptcluster.db >>>> package. ?Because I note that for the hugene10sttranscriptcluster.db >>>> package I get the following: >>>> >>>> >>>> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% >>>> ls(hugene10stv1cdf)) >>>> >>>> ? ? Mode ? FALSE ? ?TRUE ? ?NA's >>>> ? ? logical ? ? 962 ? 32295 ? ? ? 0 >>>> >>>> summary(ls(hugene10stv1cdf) %in% >>>> Lkeys(hugene10sttranscriptclusterSYMBOL)) >>>> >>>> ? ? Mode ? FALSE ? ?TRUE ? ?NA's >>>> ? ? logical ? ? ?26 ? 32295 ? ? ? 0 >>>> >>>> >>>> And this looks like a closer match for what you are doing (considering >>>> that we don't have a properly supported cdf file in this case). >>>> >>>> Hope this helps, >>>> >>>> >>>> ? ?Marc >>>> >>>> >>>> >>>> On 05/03/2010 09:28 AM, Laurent Gautier wrote: >>>> >>>> >>>>> >>>>> Hi James, >>>>> >>>>> Thanks for the clarifications. I am happy to see that Affymetrix has >>>>> picked up the concept of alternative CDF definitions and makes it >>>>> easier for its users. >>>>> >>>>> Regarding bioconductor, wouldn't it make sense to either mark packages >>>>> as "unsupported", or better take them to a different location, making >>>>> their download by the unaware less likely. In the present case should >>>>> the CDF be placed outside of the main repository ? >>>>> >>>>> In addition, wouldn't it make sense to coordinate the release the >>>>> release of probe/probeset mapping structures and annotation files (I >>>>> am reading below that there annotation for revision 5 while the >>>>> mapping is for revision 4) ? >>>>> What about making the revision number a documented _non- exported_ >>>>> vector in the packages ? >>>>> This way one could do for example: >>>>> >>>>> >>>>>> >>>>>> hugene10stprobeset:::revision >>>>>> >>>>>> >>>>> >>>>> [1] "r5" >>>>> (keeping the vector non-exported circumvents the issue of a scope >>>>> pollution whenever different packages with a variable "revision" are >>>>> in the search path). >>>>> >>>>> Best, >>>>> >>>>> >>>>> Laurent >>>>> >>>>> >>>>> >>>>> On 03/05/10 17:05, James W. MacDonald wrote: >>>>> >>>>> >>>>>> >>>>>> Hi Laurent, >>>>>> >>>>>> Laurent Gautier wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> Dear List, >>>>>>> >>>>>>> I am noting potential issues in the package pair >>>>>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>>>>>> probe set IDs are not overlapping: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> library(hugene10stv1cdf) >>>>>>>> library(hugene10stprobeset.db) >>>>>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> ? ? Mode ? FALSE ? ?TRUE ? ?NA's >>>>>>> logical ? 28026 ? ?4295 ? ? ? 0 >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> ? ? Mode ? FALSE ? ?TRUE ? ?NA's >>>>>>> logical ?252727 ? ?4295 ? ? ? 0 >>>>>>> >>>>>>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>>>>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>>>>>> revision 1. It is unclear to me whether this is linked to the >>>>>>> problem, but if so then there is no hugene10stv5cdf, neither >>>>>>> annotation for v1. >>>>>>> >>>>>>> >>>>>> >>>>>> It's hard to say what the 'revision 5' refers to. There is only one >>>>>> HuGene chip, and it is the version 1. There _have_ been nine versions >>>>>> of the annotation file released by Affy (Releases 22-30), so there is >>>>>> no telling what 'revision 5' refers to. But certainly it doesn't >>>>>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >>>>>> >>>>>> I have a personal thesis that the Exon and Gene chips contain all >>>>>> manner of extra sequences that Affy threw on there so they wouldn't >>>>>> have the same problem they had with their 3'-biased chips. Namely >>>>>> that the chips were out-of-date the minute they finished the first >>>>>> production run because the annotations are so fluid. Now they can >>>>>> simply take the original 32K probesets and slice-n-dice them at will >>>>>> to make things that ?match up with the genome as we know it now. >>>>>> >>>>>> But back to the point at hand. The problem with the hugene10stv1cdf >>>>>> is it is based on the _unsupported_ cdf file that Affy makes >>>>>> available. We make it available as well, for those who insist on >>>>>> using the makecdfenv/affy pipeline, rather than the >>>>>> pdInfoBuilder/oligo pipeline, which is what one should arguably be >>>>>> using. Given that the data being used to create the cdf package is >>>>>> specifically unsupported, caveat emptor. >>>>>> >>>>>> I note that the supported library files do contain an 'r4' in the >>>>>> file name, so assume without any backing data that this library would >>>>>> actually hew more closely to the annotation data they supply. >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> The obligatory sessionInfo() is: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> sessionInfo() >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> R version 2.11.0 Patched (2010-04-24 r51813) >>>>>>> i686-pc-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> ? [1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C >>>>>>> ? [3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8 >>>>>>> ? [5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8 >>>>>>> ? [7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C >>>>>>> ? [9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C >>>>>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>>> >>>>>>> other attached packages: >>>>>>> ? [1] oligo_1.12.0 ? ? ? ? ? ? ? ?AffyCompatible_1.8.0 >>>>>>> ? [3] RCurl_1.4-1 ? ? ? ? ? ? ? ? bitops_1.0-4.1 >>>>>>> ? [5] XML_2.8-1 ? ? ? ? ? ? ? ? ? oligoClasses_1.10.0 >>>>>>> ? [7] limma_3.4.0 ? ? ? ? ? ? ? ? hugene10stv1cdf_2.6.0 >>>>>>> ? [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>>>>>> [11] RSQLite_0.8-4 ? ? ? ? ? ? ? DBI_0.2-5 >>>>>>> [13] AnnotationDbi_1.10.0 ? ? ? ?affxparser_1.20.0 >>>>>>> [15] affy_1.26.0 ? ? ? ? ? ? ? ? Biobase_2.8.0 >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] affyio_1.16.0 ? ? ? ? Biostrings_2.16.0 ? ? IRanges_1.6.0 >>>>>>> [4] preprocessCore_1.10.0 splines_2.11.0 ? ? ? ?tcltk_2.11.0 >>>>>>> [7] tools_2.11.0 >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> >>>>>>> Laurent >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi Benilton, Thanks for the information. I did miss the named argument "target" at the end of the signature for rma(). However, one note about the documentation: now that know the existence of "target" I am still unable to infer from the help file for rma() that target="core" returns "transcript cluster". Would a more explicit terms be clearer ? For example, the Affymetrix documentation refers to "probe groups" and under the following terms: """ *Probe Group* â?? A generic term for any grouping of related GeneChip® array probes from the array design. On Exon Arrays, a probe group can be a probe set <http: www.affymetrix.com="" estore="" support="" help="" exon_glossary="" index.aff="" x;jsessionid="C916C1FE987F5C4334442DC478969B1E#probeset">, exon cluster <http: www.affymetrix.com="" estore="" support="" help="" exon_glossary="" index.aff="" x;jsessionid="C916C1FE987F5C4334442DC478969B1E#exonclust">, or transcript cluster <http: www.affymetrix.com="" estore="" support="" help="" exon_glossary="" index.aff="" x;jsessionid="C916C1FE987F5C4334442DC478969B1E#transclust">.� On Gene Arrays, the only kind of probe group is the transcript cluster <http: www.affymetrix.com="" estore="" support="" help="" exon_glossary="" index.aff="" x;jsessionid="C916C1FE987F5C4334442DC478969B1E#transclust">. NetAffx detail pages are provided for all probe groups of each type for Exon and Gene Arrays. """ What about a parameter 'probe_group = c("probe set", "exon cluster", "transcript cluster")' ? Also, wouldn't the propagation of which "probe group" was used to the resulting expression set be helpful to end-users ? L. On 5/10/10 11:13 AM, Benilton Carvalho wrote: > Hi Laurent, > > The help file for rma() in oligo describes that the default value for > target is "core". Therefore, "transcript cluster" version. > > If you call rma() using target="probeset", you'll get the probeset version. > > Best, > > b > > On Mon, May 10, 2010 at 7:37 AM, Laurent Gautier<laurent@cbs.dtu.dk> wrote: > >> Hi Marc, >> >> Affymetrix possibly changing the way features are described might not be the >> only source of confusion. >> >> Using "oligo" does not appear to make things must better, as the information >> that can be obtained after running "rma()" is: >> >> >>> eset@annotation >>> >> [1] "pd.hugene.1.0.st.v1" >> >> Is this the "probeset" version ? Is this the "transcript cluster" version ? >> Obviously this is of utmost importance (as the probe-level summarization >> step will use one given grouping). >> >> Going for a fishing expedition seems a bit awkward: >> >> >>> summary(featureNames(eset) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) >>> >> Mode FALSE TRUE NA's >> logical 40 33257 0 >> >> As well, with the annotation (finally) becoming very fluid, does shipping >> any probe grouping without an associated annotation make any sense ? >> >> >> >> Laurent >> >> >> On 04/05/10 03:04, Marc Carlson wrote: >> >>> Hi Laurent, >>> >>> The really confusing thing about the HuGene chip from Affymetrix is that >>> they changed the way they were describing their features mid- stream. So >>> now people who work with this have to be mindful of how the probes have >>> been grouped ("probesets" or "transcript clusters"?). Arthur Li has >>> been kind enough to furnish the project with both kinds of package as an >>> option which is why I noticed what I did earlier about the transcript >>> cluster version of the package. But the fact that Affymetrix have >>> abandoned support for their cdf file is also creates a unique problem >>> for us. I agree with Jim that people should arguably be using oligo >>> rather than affy for analyzing this kind of chip. But I also agree with >>> you that a friendly warning would be a great idea for this one >>> particular package. >>> >>> >>> Marc >>> >>> >>> >>> >>> On 05/03/2010 03:00 PM, Laurent Gautier wrote: >>> >>> >>>> Hi Marc, >>>> >>>> What I am reading translates into very little confidence in anything >>>> related to hugene 1.0ST in the bioconductor "affy" pipeline, and I >>>> really think that it should be more difficult to use it without going >>>> through steps that require one to explicitly see that this is >>>> untested/not recommended/unsafe. The CDF seems to be of uncertain >>>> quality to all, yet provided by bioconductor, and a warning message / >>>> recommendation to switch to oligo when attaching the package would be >>>> helpful, I think. >>>> >>>> Best, >>>> >>>> >>>> Laurent >>>> >>>> >>>> >>>> On 5/3/10 7:07 PM, Marc Carlson wrote: >>>> >>>> >>>>> Hi Laurent, >>>>> >>>>> Further complicating things, the hugene10stprobeset.db package was a >>>>> contributed package. From the DESCRIPTION file you can see that it was >>>>> contributed by Arthur Li. You might want to ask him for more details >>>>> about this package and also about the hugene10sttranscriptcluster.db >>>>> package. Because I note that for the hugene10sttranscriptcluster.db >>>>> package I get the following: >>>>> >>>>> >>>>> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% >>>>> ls(hugene10stv1cdf)) >>>>> >>>>> Mode FALSE TRUE NA's >>>>> logical 962 32295 0 >>>>> >>>>> summary(ls(hugene10stv1cdf) %in% >>>>> Lkeys(hugene10sttranscriptclusterSYMBOL)) >>>>> >>>>> Mode FALSE TRUE NA's >>>>> logical 26 32295 0 >>>>> >>>>> >>>>> And this looks like a closer match for what you are doing (considering >>>>> that we don't have a properly supported cdf file in this case). >>>>> >>>>> Hope this helps, >>>>> >>>>> >>>>> Marc >>>>> >>>>> >>>>> >>>>> On 05/03/2010 09:28 AM, Laurent Gautier wrote: >>>>> >>>>> >>>>> >>>>>> Hi James, >>>>>> >>>>>> Thanks for the clarifications. I am happy to see that Affymetrix has >>>>>> picked up the concept of alternative CDF definitions and makes it >>>>>> easier for its users. >>>>>> >>>>>> Regarding bioconductor, wouldn't it make sense to either mark packages >>>>>> as "unsupported", or better take them to a different location, making >>>>>> their download by the unaware less likely. In the present case should >>>>>> the CDF be placed outside of the main repository ? >>>>>> >>>>>> In addition, wouldn't it make sense to coordinate the release the >>>>>> release of probe/probeset mapping structures and annotation files (I >>>>>> am reading below that there annotation for revision 5 while the >>>>>> mapping is for revision 4) ? >>>>>> What about making the revision number a documented _non- exported_ >>>>>> vector in the packages ? >>>>>> This way one could do for example: >>>>>> >>>>>> >>>>>> >>>>>>> hugene10stprobeset:::revision >>>>>>> >>>>>>> >>>>>>> >>>>>> [1] "r5" >>>>>> (keeping the vector non-exported circumvents the issue of a scope >>>>>> pollution whenever different packages with a variable "revision" are >>>>>> in the search path). >>>>>> >>>>>> Best, >>>>>> >>>>>> >>>>>> Laurent >>>>>> >>>>>> >>>>>> >>>>>> On 03/05/10 17:05, James W. MacDonald wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Laurent, >>>>>>> >>>>>>> Laurent Gautier wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Dear List, >>>>>>>> >>>>>>>> I am noting potential issues in the package pair >>>>>>>> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >>>>>>>> probe set IDs are not overlapping: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> library(hugene10stv1cdf) >>>>>>>>> library(hugene10stprobeset.db) >>>>>>>>> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Mode FALSE TRUE NA's >>>>>>>> logical 28026 4295 0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Mode FALSE TRUE NA's >>>>>>>> logical 252727 4295 0 >>>>>>>> >>>>>>>> Reading closely, one can observe that "hugene10stprobeset.db" refers >>>>>>>> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >>>>>>>> revision 1. It is unclear to me whether this is linked to the >>>>>>>> problem, but if so then there is no hugene10stv5cdf, neither >>>>>>>> annotation for v1. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> It's hard to say what the 'revision 5' refers to. There is only one >>>>>>> HuGene chip, and it is the version 1. There _have_ been nine versions >>>>>>> of the annotation file released by Affy (Releases 22-30), so there is >>>>>>> no telling what 'revision 5' refers to. But certainly it doesn't >>>>>>> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >>>>>>> >>>>>>> I have a personal thesis that the Exon and Gene chips contain all >>>>>>> manner of extra sequences that Affy threw on there so they wouldn't >>>>>>> have the same problem they had with their 3'-biased chips. Namely >>>>>>> that the chips were out-of-date the minute they finished the first >>>>>>> production run because the annotations are so fluid. Now they can >>>>>>> simply take the original 32K probesets and slice-n-dice them at will >>>>>>> to make things that match up with the genome as we know it now. >>>>>>> >>>>>>> But back to the point at hand. The problem with the hugene10stv1cdf >>>>>>> is it is based on the _unsupported_ cdf file that Affy makes >>>>>>> available. We make it available as well, for those who insist on >>>>>>> using the makecdfenv/affy pipeline, rather than the >>>>>>> pdInfoBuilder/oligo pipeline, which is what one should arguably be >>>>>>> using. Given that the data being used to create the cdf package is >>>>>>> specifically unsupported, caveat emptor. >>>>>>> >>>>>>> I note that the supported library files do contain an 'r4' in the >>>>>>> file name, so assume without any backing data that this library would >>>>>>> actually hew more closely to the annotation data they supply. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> The obligatory sessionInfo() is: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> R version 2.11.0 Patched (2010-04-24 r51813) >>>>>>>> i686-pc-linux-gnu >>>>>>>> >>>>>>>> locale: >>>>>>>> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >>>>>>>> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >>>>>>>> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >>>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>>> other attached packages: >>>>>>>> [1] oligo_1.12.0 AffyCompatible_1.8.0 >>>>>>>> [3] RCurl_1.4-1 bitops_1.0-4.1 >>>>>>>> [5] XML_2.8-1 oligoClasses_1.10.0 >>>>>>>> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >>>>>>>> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >>>>>>>> [11] RSQLite_0.8-4 DBI_0.2-5 >>>>>>>> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >>>>>>>> [15] affy_1.26.0 Biobase_2.8.0 >>>>>>>> >>>>>>>> loaded via a namespace (and not attached): >>>>>>>> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >>>>>>>> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >>>>>>>> [7] tools_2.11.0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> >>>>>>>> Laurent >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor@stat.math.ethz.ch >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hey Laurent, the reason I chose to use probeset, core, full and extended is because that's how Affymetrix describes their meta-probeset files. But, of course, I'm open to suggestions to improve the package. b On Tue, May 11, 2010 at 7:37 AM, Laurent Gautier <laurent at="" cbs.dtu.dk=""> wrote: > Hi Benilton, > > Thanks for the information. I did miss the named argument "target" at the > end of the signature for rma(). > > However, one note about the documentation: now that know the existence of > "target" I am still unable to infer from the help > file for rma() that target="core" returns "transcript cluster". Would a more > explicit terms be clearer ? > > For example, the Affymetrix documentation refers to "probe groups" and under > the following terms: > """ > Probe Group ??? A generic term for any grouping of related GeneChip?? array > probes from the array design. On Exon Arrays, a probe group can be a probe > set, exon cluster, or transcript cluster.??? On Gene Arrays, the only kind > of probe group is the transcript cluster. NetAffx detail pages are provided > for all probe groups of each type for Exon and Gene Arrays. > """ > > What about a parameter 'probe_group = c("probe set", "exon cluster", > "transcript cluster")' ? > > > Also, wouldn't the propagation of which "probe group" was used to the > resulting expression set be helpful to end-users ? > > > L. > > > On 5/10/10 11:13 AM, Benilton Carvalho wrote: > > Hi Laurent, > > The help file for rma() in oligo describes that the default value for > target is "core". Therefore, "transcript cluster" version. > > If you call rma() using target="probeset", you'll get the probeset version. > > Best, > > b > > On Mon, May 10, 2010 at 7:37 AM, Laurent Gautier <laurent at="" cbs.dtu.dk=""> wrote: > > > Hi Marc, > > Affymetrix possibly changing the way features are described might not be the > only source of confusion. > > Using "oligo" does not appear to make things must better, as the information > that can be obtained after running "rma()" is: > > > > eset at annotation > > > [1] "pd.hugene.1.0.st.v1" > > Is this the "probeset" version ? Is this the "transcript cluster" version ? > Obviously this is of utmost importance (as the probe-level summarization > step will use one given grouping). > > Going for a fishing expedition seems a bit awkward: > > > > summary(featureNames(eset) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) > > > ? Mode ? FALSE ? ?TRUE ? ?NA's > logical ? ? ?40 ? 33257 ? ? ? 0 > > As well, with the annotation (finally) becoming very fluid, does shipping > any probe grouping without an associated annotation make any sense ? > > > > Laurent > > > On 04/05/10 03:04, Marc Carlson wrote: > > > Hi Laurent, > > The really confusing thing about the HuGene chip from Affymetrix is that > they changed the way they were describing their features mid-stream. ?So > now people who work with this have to be mindful of how the probes have > been grouped ("probesets" or "transcript clusters"?). ?Arthur Li has > been kind enough to furnish the project with both kinds of package as an > option which is why I noticed what I did earlier about the transcript > cluster version of the package. ?But the fact that Affymetrix have > abandoned support for their cdf file is also creates a unique problem > for us. ?I agree with Jim that people should arguably be using oligo > rather than affy for analyzing this kind of chip. ?But I also agree with > you that a friendly warning would be a great idea for this one > particular package. > > > ? Marc > > > > > On 05/03/2010 03:00 PM, Laurent Gautier wrote: > > > > Hi Marc, > > What I am reading translates into very little confidence in anything > related to hugene 1.0ST in the bioconductor "affy" pipeline, and I > really think that it should be more difficult to use it without going > through steps that require one to explicitly see that this is > untested/not recommended/unsafe. The CDF seems to be of uncertain > quality to all, yet provided by bioconductor, and a warning message / > recommendation to switch to oligo when attaching the package would be > helpful, I think. > > Best, > > > Laurent > > > > On 5/3/10 7:07 PM, Marc Carlson wrote: > > > > Hi Laurent, > > Further complicating things, the hugene10stprobeset.db package was a > contributed package. ?From the DESCRIPTION file you can see that it was > contributed by Arthur Li. ?You might want to ask him for more details > about this package and also about the hugene10sttranscriptcluster.db > package. ?Because I note that for the hugene10sttranscriptcluster.db > package I get the following: > > > summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% > ls(hugene10stv1cdf)) > > ? ? Mode ? FALSE ? ?TRUE ? ?NA's > ? ? logical ? ? 962 ? 32295 ? ? ? 0 > > summary(ls(hugene10stv1cdf) %in% > Lkeys(hugene10sttranscriptclusterSYMBOL)) > > ? ? Mode ? FALSE ? ?TRUE ? ?NA's > ? ? logical ? ? ?26 ? 32295 ? ? ? 0 > > > And this looks like a closer match for what you are doing (considering > that we don't have a properly supported cdf file in this case). > > Hope this helps, > > > ? ?Marc > > > > On 05/03/2010 09:28 AM, Laurent Gautier wrote: > > > > > Hi James, > > Thanks for the clarifications. I am happy to see that Affymetrix has > picked up the concept of alternative CDF definitions and makes it > easier for its users. > > Regarding bioconductor, wouldn't it make sense to either mark packages > as "unsupported", or better take them to a different location, making > their download by the unaware less likely. In the present case should > the CDF be placed outside of the main repository ? > > In addition, wouldn't it make sense to coordinate the release the > release of probe/probeset mapping structures and annotation files (I > am reading below that there annotation for revision 5 while the > mapping is for revision 4) ? > What about making the revision number a documented _non-exported_ > vector in the packages ? > This way one could do for example: > > > > > hugene10stprobeset:::revision > > > > > [1] "r5" > (keeping the vector non-exported circumvents the issue of a scope > pollution whenever different packages with a variable "revision" are > in the search path). > > Best, > > > Laurent > > > > On 03/05/10 17:05, James W. MacDonald wrote: > > > > > Hi Laurent, > > Laurent Gautier wrote: > > > > > Dear List, > > I am noting potential issues in the package pair > "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of > probe set IDs are not overlapping: > > > > > > library(hugene10stv1cdf) > library(hugene10stprobeset.db) > summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) > > > > > ? ? Mode ? FALSE ? ?TRUE ? ?NA's > logical ? 28026 ? ?4295 ? ? ? 0 > > > > > summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) > > > > > ? ? Mode ? FALSE ? ?TRUE ? ?NA's > logical ?252727 ? ?4295 ? ? ? 0 > > Reading closely, one can observe that "hugene10stprobeset.db" refers > to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a > revision 1. It is unclear to me whether this is linked to the > problem, but if so then there is no hugene10stv5cdf, neither > annotation for v1. > > > > > It's hard to say what the 'revision 5' refers to. There is only one > HuGene chip, and it is the version 1. There _have_ been nine versions > of the annotation file released by Affy (Releases 22-30), so there is > no telling what 'revision 5' refers to. But certainly it doesn't > refer to a HuGene-1_0-st-v5 chip, as no such thing exists. > > I have a personal thesis that the Exon and Gene chips contain all > manner of extra sequences that Affy threw on there so they wouldn't > have the same problem they had with their 3'-biased chips. Namely > that the chips were out-of-date the minute they finished the first > production run because the annotations are so fluid. Now they can > simply take the original 32K probesets and slice-n-dice them at will > to make things that ?match up with the genome as we know it now. > > But back to the point at hand. The problem with the hugene10stv1cdf > is it is based on the _unsupported_ cdf file that Affy makes > available. We make it available as well, for those who insist on > using the makecdfenv/affy pipeline, rather than the > pdInfoBuilder/oligo pipeline, which is what one should arguably be > using. Given that the data being used to create the cdf package is > specifically unsupported, caveat emptor. > > I note that the supported library files do contain an 'r4' in the > file name, so assume without any backing data that this library would > actually hew more closely to the annotation data they supply. > > Best, > > Jim > > > > > > > The obligatory sessionInfo() is: > > > > > > sessionInfo() > > > > > R version 2.11.0 Patched (2010-04-24 r51813) > i686-pc-linux-gnu > > locale: > ? [1] LC_CTYPE=en_GB.utf8 ? ? ? LC_NUMERIC=C > ? [3] LC_TIME=en_GB.utf8 ? ? ? ?LC_COLLATE=en_GB.utf8 > ? [5] LC_MONETARY=C ? ? ? ? ? ? LC_MESSAGES=en_GB.utf8 > ? [7] LC_PAPER=en_GB.utf8 ? ? ? LC_NAME=C > ? [9] LC_ADDRESS=C ? ? ? ? ? ? ?LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > ? [1] oligo_1.12.0 ? ? ? ? ? ? ? ?AffyCompatible_1.8.0 > ? [3] RCurl_1.4-1 ? ? ? ? ? ? ? ? bitops_1.0-4.1 > ? [5] XML_2.8-1 ? ? ? ? ? ? ? ? ? oligoClasses_1.10.0 > ? [7] limma_3.4.0 ? ? ? ? ? ? ? ? hugene10stv1cdf_2.6.0 > ? [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 > [11] RSQLite_0.8-4 ? ? ? ? ? ? ? DBI_0.2-5 > [13] AnnotationDbi_1.10.0 ? ? ? ?affxparser_1.20.0 > [15] affy_1.26.0 ? ? ? ? ? ? ? ? Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 ? ? ? ? Biostrings_2.16.0 ? ? IRanges_1.6.0 > [4] preprocessCore_1.10.0 splines_2.11.0 ? ? ? ?tcltk_2.11.0 > [7] tools_2.11.0 > > > > > > > Best, > > > Laurent > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >
ADD REPLY
0
Entering edit mode
Dear all, Please allow me to add some notes regarding the Affymetrix meta- probeset files, which allow to combine the different probeset_ids for a certain gene into one transcript_cluster_id: As far as I understand Affymetrix has created the meta-probeset files originally (in 2006) for their initial GUI application "ExACT" to analyze exon arrays. ExACT was later replaced by the "Expression Console", however you still need the meta-probeset files e.g. for the Affymetrix Power Tools (APT). You can download the meta-probeset files for exon arrays from the Affymetrix web-site, e.g. for HuExon the file "HuEx-1_0-st-v2.r2.dt1.hg18.ps.zip" which contains the following core, extendend, full files: - HuEx-1_0-st-v2.r2.dt1.hg18.core.mps - HuEx-1_0-st-v2.r2.dt1.hg18.extended.mps - HuEx-1_0-st-v2.r2.dt1.hg18.full.mps However, if you look at the creation date you will see that these files were created on "create_date=Tue Sep 19 15:18:05 PDT 2006". Meanwhile, the assignment of probesets to the transcript-clusters may have changed. For the core assignment the zip-file seems to contain a newer version: - HuEx-1_0-st-v2.r2.dt1.hg18.comprehensive.mps which shows already some differences to the original core.mps file. However, even this file was created on "create_date=Wed Aug 1 16:55:41 PDT 2007". The whole genome arrays such as HuGene could initially only summarize at the transcript level (release r3). However, with release r4 Affymetrix has converted the whole genome arrays into exon arrays, thus for r4 you can now download transcript annotation files and probeset annotation files. Since the PGF-file for r4 combines the probes now at the probeset level, Affymetrix has created meta-probeset files for these arrays, too. However, these files are contained in the library zip-files, e.g. HuGene-1_0-st-v1.r4.analysis-lib-files.zip contains the file: - HuGene-1_0-st-v1.r4.mps This file contains only the information for the "core" transcripts, since in contrast to the exon arrays the whole genome arrays contain only the well annotated genes, i.e. the "core" transcripts. Looking at the creation date you will see that this file was created on "create_date=Tue Dec 9 14:44:54 PST 2008". Thus it seems that Affymetrix does not update the meta-probeset files together with the annotation files. This raises the question, which meta-probeset files are currently used. Interestingly, the meta-probeset files are not necessary, since all information is included in the probeset annotation files which Affymetrix updates each quarter. This is described in e.g. "HuEx-1_0-st-v2.na30.AFFX_README.NetAffx-CSV-Files.txt" coming with the annotation files, see the description of column "level". Thus it is possible to obtain the newest assignment between probeset_ids and transcript_cluster_ids directly from the probeset annotation file. As an example package "xps", also available from BioC, uses the Affymetrix annotation files to obtain this information for exon arrays and whole genome arrays. Package "xps" contains even a function "metaProbesets()", which allows one to create meta-probeset files from the newest annotation files, which can then be used e.g. with APT. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 5/11/10 10:30 AM, Benilton Carvalho wrote: > Hey Laurent, > > the reason I chose to use probeset, core, full and extended is because > that's how Affymetrix describes their meta-probeset files. But, of > course, I'm open to suggestions to improve the package. > > b > > On Tue, May 11, 2010 at 7:37 AM, Laurent Gautier<laurent at="" cbs.dtu.dk=""> wrote: >> Hi Benilton, >> >> Thanks for the information. I did miss the named argument "target" at the >> end of the signature for rma(). >> >> However, one note about the documentation: now that know the existence of >> "target" I am still unable to infer from the help >> file for rma() that target="core" returns "transcript cluster". Would a more >> explicit terms be clearer ? >> >> For example, the Affymetrix documentation refers to "probe groups" and under >> the following terms: >> """ >> Probe Group ??? A generic term for any grouping of related GeneChip?? array >> probes from the array design. On Exon Arrays, a probe group can be a probe >> set, exon cluster, or transcript cluster.??? On Gene Arrays, the only kind >> of probe group is the transcript cluster. NetAffx detail pages are provided >> for all probe groups of each type for Exon and Gene Arrays. >> """ >> >> What about a parameter 'probe_group = c("probe set", "exon cluster", >> "transcript cluster")' ? >> >> >> Also, wouldn't the propagation of which "probe group" was used to the >> resulting expression set be helpful to end-users ? >> >> >> L. >> >> >> On 5/10/10 11:13 AM, Benilton Carvalho wrote: >> >> Hi Laurent, >> >> The help file for rma() in oligo describes that the default value for >> target is "core". Therefore, "transcript cluster" version. >> >> If you call rma() using target="probeset", you'll get the probeset version. >> >> Best, >> >> b >> >> On Mon, May 10, 2010 at 7:37 AM, Laurent Gautier<laurent at="" cbs.dtu.dk=""> wrote: >> >> >> Hi Marc, >> >> Affymetrix possibly changing the way features are described might not be the >> only source of confusion. >> >> Using "oligo" does not appear to make things must better, as the information >> that can be obtained after running "rma()" is: >> >> >> >> eset at annotation >> >> >> [1] "pd.hugene.1.0.st.v1" >> >> Is this the "probeset" version ? Is this the "transcript cluster" version ? >> Obviously this is of utmost importance (as the probe-level summarization >> step will use one given grouping). >> >> Going for a fishing expedition seems a bit awkward: >> >> >> >> summary(featureNames(eset) %in% Lkeys(hugene10sttranscriptclusterSYMBOL)) >> >> >> Mode FALSE TRUE NA's >> logical 40 33257 0 >> >> As well, with the annotation (finally) becoming very fluid, does shipping >> any probe grouping without an associated annotation make any sense ? >> >> >> >> Laurent >> >> >> On 04/05/10 03:04, Marc Carlson wrote: >> >> >> Hi Laurent, >> >> The really confusing thing about the HuGene chip from Affymetrix is that >> they changed the way they were describing their features mid- stream. So >> now people who work with this have to be mindful of how the probes have >> been grouped ("probesets" or "transcript clusters"?). Arthur Li has >> been kind enough to furnish the project with both kinds of package as an >> option which is why I noticed what I did earlier about the transcript >> cluster version of the package. But the fact that Affymetrix have >> abandoned support for their cdf file is also creates a unique problem >> for us. I agree with Jim that people should arguably be using oligo >> rather than affy for analyzing this kind of chip. But I also agree with >> you that a friendly warning would be a great idea for this one >> particular package. >> >> >> Marc >> >> >> >> >> On 05/03/2010 03:00 PM, Laurent Gautier wrote: >> >> >> >> Hi Marc, >> >> What I am reading translates into very little confidence in anything >> related to hugene 1.0ST in the bioconductor "affy" pipeline, and I >> really think that it should be more difficult to use it without going >> through steps that require one to explicitly see that this is >> untested/not recommended/unsafe. The CDF seems to be of uncertain >> quality to all, yet provided by bioconductor, and a warning message / >> recommendation to switch to oligo when attaching the package would be >> helpful, I think. >> >> Best, >> >> >> Laurent >> >> >> >> On 5/3/10 7:07 PM, Marc Carlson wrote: >> >> >> >> Hi Laurent, >> >> Further complicating things, the hugene10stprobeset.db package was a >> contributed package. From the DESCRIPTION file you can see that it was >> contributed by Arthur Li. You might want to ask him for more details >> about this package and also about the hugene10sttranscriptcluster.db >> package. Because I note that for the hugene10sttranscriptcluster.db >> package I get the following: >> >> >> summary(Lkeys(hugene10sttranscriptclusterSYMBOL) %in% >> ls(hugene10stv1cdf)) >> >> Mode FALSE TRUE NA's >> logical 962 32295 0 >> >> summary(ls(hugene10stv1cdf) %in% >> Lkeys(hugene10sttranscriptclusterSYMBOL)) >> >> Mode FALSE TRUE NA's >> logical 26 32295 0 >> >> >> And this looks like a closer match for what you are doing (considering >> that we don't have a properly supported cdf file in this case). >> >> Hope this helps, >> >> >> Marc >> >> >> >> On 05/03/2010 09:28 AM, Laurent Gautier wrote: >> >> >> >> >> Hi James, >> >> Thanks for the clarifications. I am happy to see that Affymetrix has >> picked up the concept of alternative CDF definitions and makes it >> easier for its users. >> >> Regarding bioconductor, wouldn't it make sense to either mark packages >> as "unsupported", or better take them to a different location, making >> their download by the unaware less likely. In the present case should >> the CDF be placed outside of the main repository ? >> >> In addition, wouldn't it make sense to coordinate the release the >> release of probe/probeset mapping structures and annotation files (I >> am reading below that there annotation for revision 5 while the >> mapping is for revision 4) ? >> What about making the revision number a documented _non-exported_ >> vector in the packages ? >> This way one could do for example: >> >> >> >> >> hugene10stprobeset:::revision >> >> >> >> >> [1] "r5" >> (keeping the vector non-exported circumvents the issue of a scope >> pollution whenever different packages with a variable "revision" are >> in the search path). >> >> Best, >> >> >> Laurent >> >> >> >> On 03/05/10 17:05, James W. MacDonald wrote: >> >> >> >> >> Hi Laurent, >> >> Laurent Gautier wrote: >> >> >> >> >> Dear List, >> >> I am noting potential issues in the package pair >> "hugene10stv1cdf"/"hugene10stprobeset.db", as the respective sets of >> probe set IDs are not overlapping: >> >> >> >> >> >> library(hugene10stv1cdf) >> library(hugene10stprobeset.db) >> summary(ls(hugene10stv1cdf) %in% Lkeys(hugene10stprobesetSYMBOL)) >> >> >> >> >> Mode FALSE TRUE NA's >> logical 28026 4295 0 >> >> >> >> >> summary(Lkeys(hugene10stprobesetSYMBOL) %in% ls(hugene10stv1cdf)) >> >> >> >> >> Mode FALSE TRUE NA's >> logical 252727 4295 0 >> >> Reading closely, one can observe that "hugene10stprobeset.db" refers >> to a "revision 5" while the "v1" in "hugene10stv1cdf" suggests a >> revision 1. It is unclear to me whether this is linked to the >> problem, but if so then there is no hugene10stv5cdf, neither >> annotation for v1. >> >> >> >> >> It's hard to say what the 'revision 5' refers to. There is only one >> HuGene chip, and it is the version 1. There _have_ been nine versions >> of the annotation file released by Affy (Releases 22-30), so there is >> no telling what 'revision 5' refers to. But certainly it doesn't >> refer to a HuGene-1_0-st-v5 chip, as no such thing exists. >> >> I have a personal thesis that the Exon and Gene chips contain all >> manner of extra sequences that Affy threw on there so they wouldn't >> have the same problem they had with their 3'-biased chips. Namely >> that the chips were out-of-date the minute they finished the first >> production run because the annotations are so fluid. Now they can >> simply take the original 32K probesets and slice-n-dice them at will >> to make things that match up with the genome as we know it now. >> >> But back to the point at hand. The problem with the hugene10stv1cdf >> is it is based on the _unsupported_ cdf file that Affy makes >> available. We make it available as well, for those who insist on >> using the makecdfenv/affy pipeline, rather than the >> pdInfoBuilder/oligo pipeline, which is what one should arguably be >> using. Given that the data being used to create the cdf package is >> specifically unsupported, caveat emptor. >> >> I note that the supported library files do contain an 'r4' in the >> file name, so assume without any backing data that this library would >> actually hew more closely to the annotation data they supply. >> >> Best, >> >> Jim >> >> >> >> >> >> >> The obligatory sessionInfo() is: >> >> >> >> >> >> sessionInfo() >> >> >> >> >> R version 2.11.0 Patched (2010-04-24 r51813) >> i686-pc-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 >> [7] LC_PAPER=en_GB.utf8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] oligo_1.12.0 AffyCompatible_1.8.0 >> [3] RCurl_1.4-1 bitops_1.0-4.1 >> [5] XML_2.8-1 oligoClasses_1.10.0 >> [7] limma_3.4.0 hugene10stv1cdf_2.6.0 >> [9] hugene10stprobeset.db_5.0.1 org.Hs.eg.db_2.4.1 >> [11] RSQLite_0.8-4 DBI_0.2-5 >> [13] AnnotationDbi_1.10.0 affxparser_1.20.0 >> [15] affy_1.26.0 Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 Biostrings_2.16.0 IRanges_1.6.0 >> [4] preprocessCore_1.10.0 splines_2.11.0 tcltk_2.11.0 >> [7] tools_2.11.0 >> >> >> >> >> >> >> Best, >> >> >> Laurent >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6