Biocore response to Affymetrix data format changes
2
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 3 days ago
United States
D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip data formats in a Bioconductor mailing list post of 25 June 2003. He notes that Windows/Java linkable libraries will be provided for reading the binary GeneChip format, and that MAGE/ML exports will be available. He proposes 1) Bioconductor can provide free compiled libraries using the API and the affymetrix linkable libraries 2) Bioconductor applications use MAGE/ML, as data bloat is not noteworthy and the export contains 'all the CEL data you expect'. Kulp comments that these observations show that the details of the change are "fairly simple". In fact, the change has far-reaching implications for those who work with Bioconductor software and affymetrix data. The Bioconductor project has adopted a policy of programming only to public and open APIs. Primary reasons: a) R is free software under the GPL. Although we have made an effort to release the main Bioc components under LGPL, as a collaborative gesture towards commercial entities who wish to use our tools, R itself is GPL. It is not possible to legally distribute tools that combine compilations of non-free software with GPL software. b) Beyond the restrictions of the GPL in relation to R, the Digital Millenium Copyright Act (DMCA) creates legal complications for those who create compilations of mixed free and proprietary software. We have no resources to spend on legal advice or on adapting our research to a complex legal landscape. Commitment to public and open APIs allows us to carry on research in a natural and efficient way largely independently of DMCA restriction and interpretation in the complex area of reverse engineering. c) Commitment to public and open APIs leverages the user community's capabilities to discover problems and to fix them. While distribution of compiled libraries with open components as interfaces to proprietary formats may SEEM consistent with open source software methodology, this is an illusion. We have benefited from user-contributed bug fixes and would cease to do so under the regimen proposed by Kulp, because users would lack access to key elements of the interface. d) Commitment to public and open APIs sharply reduces effort required to support multiple platforms. When compiled libraries are distributed one frequently encounters conflicts with resident versions of supporting libraries and one needs to introduce substantial technology for bridging distributed objects to platforms whose resources may be out of date or noncompliant with basic standards. Time spent on nonstandard portability methodology is time subtracted from research on computational biology. As researchers we cannot accept this additional cost. e) Commitment to public and open APIs is the only approach compatible with the recognition that microarray analysis technology is immature and must be fully open to scrutiny if science is to advance in an efficient way. Comparisons of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level analyses indicate that the procedures yield different results. Users have a right to expect that results from different methodologies can be fully rationalized, and this can only occur with open implementations. These five points respond to Kulp's suggestion that we provide free binaries to the user community. The suggestion seems simple and positive but it is not feasible at all. Kulp's second suggestion is to employ the MAGE-ML format. It does appear that this constitutes a public and open API and one that we could program to. However it does appear that there will be significant information restrictions and performance costs if we are forced to go in this direction. We have one report of significant data bloat with the current embodiments of this technology. A 7 megabyte cell file had a 30 MB XML representation, and a 21 MB CDF file had a 400 MB XML representation. Kulp suggests that XML bloat does not occur, and that may be due to his access to newer forms of the transformation. We believe that compliant MAGE-ML representations will be massive. Requiring Bioconductor to work from MAGE-ML will lead to additional burdens on users that will impede research progress. In summary, Bioconductor's commitment to open and public APIs is dictated by legal and scientific considerations. Affymetrix' transition to closed file formats is difficult to understand. No one questions the technical utility of a change to a binary format. Making it secret has no utility that we can discern. Bioconductor and its users have provided R&D to affymetrix essentially free of charge. The upcoming Affymetrix GeneChip Microarray Low-Level Workshop ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix appreciates and is open to these contributions. Accommodating a non-public, non-open API for Affymetrix data would constitute a precedent that might impact methods adopted by other companies in this field. We respectfully ask that Affymetrix make a rather different precedent: open the new file format to support and encourage research and development in the microarray analysis domain. An open format will clearly benefit both Affymetrix and the scientific community. Sincerely, The Bioconductor Core Team * Douglas Bates, University of Wisconsin, USA. * Vince Carey, Harvard Medical School, USA. * Marcel Dettling, Federal Inst. Technology, Switzerland. * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA. * Byron Ellis, Harvard Department of Statistics, USA. * Laurent Gautier, Technial University of Denmark, Denmark. * Robert Gentleman, Harvard Medical School, USA. * Jeff Gentry, Dana-Farber Cancer Institute, USA. * Kurt Hornik, Technische Universitat Wien, Austria. * Torsten Hothorn, Institut fuer Medizininformatik, Biometrie und Epidemiologie, Germany. * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis, Germany. * Stefano Iacus, University of Milan, Italy * Rafael Irizarry, Department of Biostatistics (JHU), USA. * Friedrich Leisch, Technische Universitat Wien, Austria. * Martin Maechler, Federal Inst. Technology, Switzerland. * Gordon Smyth, Walter and Eliza Hall Institute, Australia. * Anthony Rossini, University of Washington and the Fred Hutchinson Cancer Research Center, USA. * Gunther Sawitzki, Institute fur Angewandte Mathematik, Germany. * Luke Tierney, University of Iowa, USA. * Jean Yee Hwa Yang, University of California, San Francisco, USA. * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA.
Microarray GO Cancer cdf Microarray GO Cancer cdf • 1.6k views
ADD COMMENT
0
Entering edit mode
Isaac Neuhaus ▴ 360
@isaac-neuhaus-22
Last seen 9.6 years ago
United States
Many of us work in the pharmaceutical industry and have been taking advantage of your excellent tools. We are also 'monetarily speaking' important Affymetrix customers. I would like to know how WE, in the pharmaceutical industry could help and facilitate your continuing effort in developing these useful tools. Isaac Vincent Carey 525-2265 wrote: > D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip > data formats in a Bioconductor mailing list post of 25 June 2003. > He notes that Windows/Java linkable libraries will be provided > for reading the binary GeneChip format, and that MAGE/ML > exports will be available. He proposes > 1) Bioconductor can provide free compiled libraries using > the API and the affymetrix linkable libraries > 2) Bioconductor applications use MAGE/ML, as data bloat is > not noteworthy and the export contains 'all the CEL data you > expect'. > > Kulp comments that these observations show that the details > of the change are "fairly simple". In fact, the change has > far-reaching implications for those who work with Bioconductor > software and affymetrix data. > > The Bioconductor project has adopted a policy of programming > only to public and open APIs. Primary reasons: > a) R is free software under the GPL. Although we have made > an effort to release the main Bioc components under LGPL, as > a collaborative gesture towards commercial entities who wish > to use our tools, R itself is GPL. It is not possible to > legally distribute tools that combine compilations of > non-free software with GPL software. > b) Beyond the restrictions of the GPL in relation to R, > the Digital Millenium Copyright Act (DMCA) creates legal > complications for those who create compilations of mixed > free and proprietary software. We have no resources to spend > on legal advice or on adapting our research to a complex > legal landscape. Commitment to public and open APIs allows > us to carry on research in a natural and efficient way largely > independently of DMCA restriction and interpretation in > the complex area of reverse engineering. > c) Commitment to public and open APIs leverages the user > community's capabilities to discover problems and to > fix them. While distribution of compiled libraries with > open components as interfaces to proprietary formats may > SEEM consistent with open source software methodology, > this is an illusion. We have benefited from user-contributed > bug fixes and would cease to do so under the regimen proposed > by Kulp, because users would lack access to key elements of > the interface. > d) Commitment to public and open APIs sharply reduces > effort required to support multiple platforms. When compiled > libraries are distributed one frequently encounters conflicts > with resident versions of supporting libraries and one > needs to introduce substantial technology for bridging > distributed objects to platforms whose resources may be > out of date or noncompliant with basic standards. Time spent > on nonstandard portability methodology is time subtracted > from research on computational biology. As researchers > we cannot accept this additional cost. > e) Commitment to public and open APIs is the only approach > compatible with the recognition that microarray analysis > technology is immature and must be fully open to scrutiny > if science is to advance in an efficient way. Comparisons > of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level > analyses indicate that the procedures yield different results. > Users have a right to expect that results from different > methodologies can be fully rationalized, and this can only > occur with open implementations. > > These five points respond to Kulp's suggestion that we > provide free binaries to the user community. The suggestion > seems simple and positive but it is not feasible at all. > > Kulp's second suggestion is to employ the MAGE-ML format. > It does appear that this constitutes a public and open API > and one that we could program to. However it does appear > that there will be significant information restrictions and > performance costs if we are forced to go in this direction. > We have one report of significant data bloat with the > current embodiments of this technology. A 7 megabyte > cell file had a 30 MB XML representation, and a 21 MB > CDF file had a 400 MB XML representation. Kulp suggests > that XML bloat does not occur, and that may be due to > his access to newer forms of the transformation. We > believe that compliant MAGE-ML representations will be > massive. Requiring Bioconductor to work from MAGE-ML > will lead to additional burdens on users that will > impede research progress. > > In summary, Bioconductor's commitment to open and public > APIs is dictated by legal and scientific considerations. > Affymetrix' transition to closed file formats is difficult > to understand. No one questions the technical utility of > a change to a binary format. Making it secret has no > utility that we can discern. Bioconductor and its users > have provided R&D to affymetrix essentially free of charge. > The upcoming Affymetrix GeneChip Microarray Low-Level Workshop > ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix > appreciates and is open to these contributions. > Accommodating a non-public, non-open API for Affymetrix data > would constitute a precedent that might impact methods > adopted by other companies in this field. We respectfully > ask that Affymetrix make a rather different precedent: > open the new file format to support and encourage research > and development in the microarray analysis domain. > An open format will clearly benefit both Affymetrix and > the scientific community. > > Sincerely, > The Bioconductor Core Team > > * Douglas Bates, University of Wisconsin, USA. > * Vince Carey, Harvard Medical School, USA. > * Marcel Dettling, Federal Inst. Technology, Switzerland. > * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA. > * Byron Ellis, Harvard Department of Statistics, USA. > * Laurent Gautier, Technial University of Denmark, Denmark. > * Robert Gentleman, Harvard Medical School, USA. > * Jeff Gentry, Dana-Farber Cancer Institute, USA. > * Kurt Hornik, Technische Universitat Wien, Austria. > * Torsten Hothorn, Institut fuer Medizininformatik, Biometrie und Epidemiologie, Germany. > * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis, Germany. > * Stefano Iacus, University of Milan, Italy > * Rafael Irizarry, Department of Biostatistics (JHU), USA. > * Friedrich Leisch, Technische Universitat Wien, Austria. > * Martin Maechler, Federal Inst. Technology, Switzerland. > * Gordon Smyth, Walter and Eliza Hall Institute, Australia. > * Anthony Rossini, University of Washington and the Fred Hutchinson Cancer Research Center, USA. > * Gunther Sawitzki, Institute fur Angewandte Mathematik, Germany. > * Luke Tierney, University of Iowa, USA. > * Jean Yee Hwa Yang, University of California, San Francisco, USA. > * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States
I think Rafael's suggestion is probably the most effective. If we as end users let Affymetrix know that we are unhappy with the coming changes in their policy, maybe we can get them to change. Unfortunately, right now Affy has a monopoly in the market and they know full well that any complaints will likely not be reflected in decreased sales. I am sure they are worried that Bioconductor will have an effect on the sales of MAS6 (which undoubtedly will generate a probe summary identical to rma), and the best way to eliminate that threat is to make Bioconductor as difficult to use as possible. However, letting your friendly local Affy sales rep know that you are not too keen on their proposed changes may have an effect. In addition, letting them know that you are VERY interested in the Xeotron chips may have an even greater effect ;-D Jim James W. MacDonald UMCCC Microarray Core Facility 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 >>> Isaac Neuhaus <isaac.neuhaus@bms.com> 06/30/03 02:48PM >>> Many of us work in the pharmaceutical industry and have been taking advantage of your excellent tools. We are also 'monetarily speaking' important Affymetrix customers. I would like to know how WE, in the pharmaceutical industry could help and facilitate your continuing effort in developing these useful tools. Isaac Vincent Carey 525-2265 wrote: > D. Kulp of Affymetrix commented on the upcoming proprietary GeneChip > data formats in a Bioconductor mailing list post of 25 June 2003. > He notes that Windows/Java linkable libraries will be provided > for reading the binary GeneChip format, and that MAGE/ML > exports will be available. He proposes > 1) Bioconductor can provide free compiled libraries using > the API and the affymetrix linkable libraries > 2) Bioconductor applications use MAGE/ML, as data bloat is > not noteworthy and the export contains 'all the CEL data you > expect'. > > Kulp comments that these observations show that the details > of the change are "fairly simple". In fact, the change has > far-reaching implications for those who work with Bioconductor > software and affymetrix data. > > The Bioconductor project has adopted a policy of programming > only to public and open APIs. Primary reasons: > a) R is free software under the GPL. Although we have made > an effort to release the main Bioc components under LGPL, as > a collaborative gesture towards commercial entities who wish > to use our tools, R itself is GPL. It is not possible to > legally distribute tools that combine compilations of > non-free software with GPL software. > b) Beyond the restrictions of the GPL in relation to R, > the Digital Millenium Copyright Act (DMCA) creates legal > complications for those who create compilations of mixed > free and proprietary software. We have no resources to spend > on legal advice or on adapting our research to a complex > legal landscape. Commitment to public and open APIs allows > us to carry on research in a natural and efficient way largely > independently of DMCA restriction and interpretation in > the complex area of reverse engineering. > c) Commitment to public and open APIs leverages the user > community's capabilities to discover problems and to > fix them. While distribution of compiled libraries with > open components as interfaces to proprietary formats may > SEEM consistent with open source software methodology, > this is an illusion. We have benefited from user-contributed > bug fixes and would cease to do so under the regimen proposed > by Kulp, because users would lack access to key elements of > the interface. > d) Commitment to public and open APIs sharply reduces > effort required to support multiple platforms. When compiled > libraries are distributed one frequently encounters conflicts > with resident versions of supporting libraries and one > needs to introduce substantial technology for bridging > distributed objects to platforms whose resources may be > out of date or noncompliant with basic standards. Time spent > on nonstandard portability methodology is time subtracted > from research on computational biology. As researchers > we cannot accept this additional cost. > e) Commitment to public and open APIs is the only approach > compatible with the recognition that microarray analysis > technology is immature and must be fully open to scrutiny > if science is to advance in an efficient way. Comparisons > of MAS4, MAS5, Li and Wong's MBEI and RMA probe-level > analyses indicate that the procedures yield different results. > Users have a right to expect that results from different > methodologies can be fully rationalized, and this can only > occur with open implementations. > > These five points respond to Kulp's suggestion that we > provide free binaries to the user community. The suggestion > seems simple and positive but it is not feasible at all. > > Kulp's second suggestion is to employ the MAGE-ML format. > It does appear that this constitutes a public and open API > and one that we could program to. However it does appear > that there will be significant information restrictions and > performance costs if we are forced to go in this direction. > We have one report of significant data bloat with the > current embodiments of this technology. A 7 megabyte > cell file had a 30 MB XML representation, and a 21 MB > CDF file had a 400 MB XML representation. Kulp suggests > that XML bloat does not occur, and that may be due to > his access to newer forms of the transformation. We > believe that compliant MAGE-ML representations will be > massive. Requiring Bioconductor to work from MAGE-ML > will lead to additional burdens on users that will > impede research progress. > > In summary, Bioconductor's commitment to open and public > APIs is dictated by legal and scientific considerations. > Affymetrix' transition to closed file formats is difficult > to understand. No one questions the technical utility of > a change to a binary format. Making it secret has no > utility that we can discern. Bioconductor and its users > have provided R&D to affymetrix essentially free of charge. > The upcoming Affymetrix GeneChip Microarray Low-Level Workshop > ( http://eci-events.com/AffyGeneChip/ ) is proof that Affymetrix > appreciates and is open to these contributions. > Accommodating a non-public, non-open API for Affymetrix data > would constitute a precedent that might impact methods > adopted by other companies in this field. We respectfully > ask that Affymetrix make a rather different precedent: > open the new file format to support and encourage research > and development in the microarray analysis domain. > An open format will clearly benefit both Affymetrix and > the scientific community. > > Sincerely, > The Bioconductor Core Team > > * Douglas Bates, University of Wisconsin, USA. > * Vince Carey, Harvard Medical School, USA. > * Marcel Dettling, Federal Inst. Technology, Switzerland. > * Sandrine Dudoit, Division of Biostatistics, UC Berkeley, USA. > * Byron Ellis, Harvard Department of Statistics, USA. > * Laurent Gautier, Technial University of Denmark, Denmark. > * Robert Gentleman, Harvard Medical School, USA. > * Jeff Gentry, Dana-Farber Cancer Institute, USA. > * Kurt Hornik, Technische Universitat Wien, Austria. > * Torsten Hothorn, Institut fuer Medizininformatik, Biometrie und Epidemiologie, Germany. > * Wolfgang Huber, DKFZ Heidelberg, Molecular Genome Analysis, Germany. > * Stefano Iacus, University of Milan, Italy > * Rafael Irizarry, Department of Biostatistics (JHU), USA. > * Friedrich Leisch, Technische Universitat Wien, Austria. > * Martin Maechler, Federal Inst. Technology, Switzerland. > * Gordon Smyth, Walter and Eliza Hall Institute, Australia. > * Anthony Rossini, University of Washington and the Fred Hutchinson Cancer Research Center, USA. > * Gunther Sawitzki, Institute fur Angewandte Mathematik, Germany. > * Luke Tierney, University of Iowa, USA. > * Jean Yee Hwa Yang, University of California, San Francisco, USA. > * Jianhua (John) Zhang, Dana-Farber Cancer Institute, USA. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
ADD COMMENT

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6