ALL dataset by Chiaretti et al.
1
0
Entering edit mode
@heike-pospisil-1097
Last seen 10.2 years ago
Dear list member, I would like to reconstruct the analysis presented by Chiaretti et al. in BLOOD Vol. 103(7), 2004. The data set ALL within library(ALL) (found at MBI Lab4 by Sandrine Dudoit et al.) consists of 128 samples. But I read in the publication that 33 patients were evaluated by gene expression profiling. (Btw: the source http://bioconductor.org/Docs/Papers/2002/Chiaretti gives only 30 CEL- files.. ?) Which of the 128 samples in ALL are the 33 mentioned patients in the publication? Moreover, it would be great to have the original R code behind the above mentioned publication - does it exists somewhere in the Bioconductor repository? Thanks in advance, Heike -- Dr. Heike Pospisil Center for Bioinformatics, University of Hamburg Bundesstrasse 43, 20146 Hamburg, Germany phone: +49-40-42838-7303 fax: +49-40-42838-7312
• 1.5k views
ADD COMMENT
0
Entering edit mode
rgentleman ★ 5.5k
@rgentleman-7725
Last seen 9.6 years ago
United States
Hi, All packages have maintainers (this is true for data packages too). And if you want to know some of the intimate details of such packages the maintainer is the first one to contact. It is very simple to find out who that is. > packageDescription("ALL")$Main [1] "Xiaochun Li <xiaochun@jimmy.harvard.edu>" And note, the source you give for the package is not appropriate. It was put there for students of that course, but more recent versions are available through standard channels (and those should be preferred). ie. http://www.bioconductor.org/data/experimental.html Next, you need to be a bit more specific about what you want to do. If you want to verify the exact outputs, that in general is hard - and sometimes impossible. My guess is that it is impossible for this paper (although you should be able to come close). Why is it impossible? Well, you will need access to the original data (which I believe is available -although it seems that three CEL files have not been put up - I will see about tracking down the differences). Next you need access to the right version of the software used to do the preprocessing (in this case you will need to find the version of dChip that was used - hard to do, as it changed often, and that was some years ago). So, you might be able to ask the first author for some of the transformed data to see how things go from there. And so on. For most of the Bioconductor/R software used you should be able to get old versions, but since then bugs have been fixed, ideas improved and so on. If instead what you want is to come approximately close, then it is somewhat easier. Reading the paper should have told you that the analysis was on patients with T-cell ALL (and that there are two types of ALL, B-cell derived and T-cell derived). And then the ALL package yields: > table(ALL$BT) B B1 B2 B3 B4 T T1 T2 T3 T4 5 19 36 23 12 5 1 15 10 2 Which certainly suggests a starting point as there are precisely 33 samples with T-cell derived ALL. But, as I said above, different methodologies were used for normalization, so the actual values will be different (sometimes by a lot) than those used in the original paper and hence the answers you get will be different - but probably not by very much. If they seem to be very different then the first author of the original paper is the person to contact. Best wishes, Robert On Apr 15, 2005, at 9:30 AM, Heike Pospisil wrote: > Dear list member, > > I would like to reconstruct the analysis presented by Chiaretti et al. > in BLOOD Vol. 103(7), 2004. The data set ALL within library(ALL) > (found at MBI Lab4 by Sandrine Dudoit et al.) consists of 128 samples. > But I read in the publication that 33 patients were evaluated by gene > expression profiling. (Btw: the source > http://bioconductor.org/Docs/Papers/2002/Chiaretti gives only 30 > CEL-files.. ?) Which of the 128 samples in ALL are the 33 mentioned > patients in the publication? > > Moreover, it would be great to have the original R code behind the > above mentioned publication - does it exists somewhere in the > Bioconductor repository? > > Thanks in advance, > Heike > -- > Dr. Heike Pospisil > Center for Bioinformatics, University of Hamburg > Bundesstrasse 43, 20146 Hamburg, Germany > phone: +49-40-42838-7303 fax: +49-40-42838-7312 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > +--------------------------------------------------------------------- -- ----------------+ | Robert Gentleman phone: (206) 667-7700 | | Head, Program in Computational Biology fax: (206) 667-1319 | | Division of Public Health Sciences office: M2-B865 | | Fred Hutchinson Cancer Research Center | | email: rgentlem@fhcrc.org | +--------------------------------------------------------------------- -- ----------------+
ADD COMMENT

Login before adding your answer.

Traffic: 841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6