Dear list member,
I would like to reconstruct the analysis presented by Chiaretti et al.
in BLOOD
Vol. 103(7), 2004. The data set ALL within library(ALL) (found at MBI
Lab4 by
Sandrine Dudoit et al.) consists of 128 samples. But I read in the
publication
that 33 patients were evaluated by gene expression profiling. (Btw:
the source
http://bioconductor.org/Docs/Papers/2002/Chiaretti gives only 30 CEL-
files.. ?)
Which of the 128 samples in ALL are the 33 mentioned patients in the
publication?
Moreover, it would be great to have the original R code behind the
above
mentioned publication - does it exists somewhere in the Bioconductor
repository?
Thanks in advance,
Heike
--
Dr. Heike Pospisil
Center for Bioinformatics, University of Hamburg
Bundesstrasse 43, 20146 Hamburg, Germany
phone: +49-40-42838-7303 fax: +49-40-42838-7312
Hi,
All packages have maintainers (this is true for data packages too).
And if you want to know some of the intimate details of such packages
the maintainer is the first one to contact.
It is very simple to find out who that is.
> packageDescription("ALL")$Main
[1] "Xiaochun Li <xiaochun@jimmy.harvard.edu>"
And note, the source you give for the package is not appropriate. It
was put there for students of that course, but more recent versions
are
available through standard channels (and those should be preferred).
ie. http://www.bioconductor.org/data/experimental.html
Next, you need to be a bit more specific about what you want to
do.
If you want to verify the exact outputs, that in general is hard - and
sometimes impossible. My guess is that it is impossible for this paper
(although you should be able to come close). Why is it impossible?
Well, you will need access to the original data (which I believe is
available -although it seems that three CEL files have not been put up
- I will see about tracking down the differences). Next you need
access
to the right version of the software used to do the preprocessing (in
this case you will need to find the version of dChip that was used -
hard to do, as it changed often, and that was some years ago). So, you
might be able to ask the first author for some of the transformed data
to see how things go from there. And so on. For most of the
Bioconductor/R software used you should be able to get old versions,
but since then bugs have been fixed, ideas improved and so on.
If instead what you want is to come approximately close, then it is
somewhat easier. Reading the paper should have told you that the
analysis was on patients with T-cell ALL (and that there are two types
of ALL, B-cell derived and T-cell derived). And then the ALL package
yields:
> table(ALL$BT)
B B1 B2 B3 B4 T T1 T2 T3 T4
5 19 36 23 12 5 1 15 10 2
Which certainly suggests a starting point as there are precisely 33
samples with T-cell derived ALL.
But, as I said above, different methodologies were used for
normalization, so the actual values will be different (sometimes by a
lot) than those used in the original paper and hence the answers you
get will be different - but probably not by very much. If they seem to
be very different then the first author of the original paper is the
person to contact.
Best wishes,
Robert
On Apr 15, 2005, at 9:30 AM, Heike Pospisil wrote:
> Dear list member,
>
> I would like to reconstruct the analysis presented by Chiaretti et
al.
> in BLOOD Vol. 103(7), 2004. The data set ALL within library(ALL)
> (found at MBI Lab4 by Sandrine Dudoit et al.) consists of 128
samples.
> But I read in the publication that 33 patients were evaluated by
gene
> expression profiling. (Btw: the source
> http://bioconductor.org/Docs/Papers/2002/Chiaretti gives only 30
> CEL-files.. ?) Which of the 128 samples in ALL are the 33 mentioned
> patients in the publication?
>
> Moreover, it would be great to have the original R code behind the
> above mentioned publication - does it exists somewhere in the
> Bioconductor repository?
>
> Thanks in advance,
> Heike
> --
> Dr. Heike Pospisil
> Center for Bioinformatics, University of Hamburg
> Bundesstrasse 43, 20146 Hamburg, Germany
> phone: +49-40-42838-7303 fax: +49-40-42838-7312
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
+---------------------------------------------------------------------
--
----------------+
| Robert Gentleman phone: (206) 667-7700
|
| Head, Program in Computational Biology fax: (206) 667-1319 |
| Division of Public Health Sciences office: M2-B865
|
| Fred Hutchinson Cancer Research Center
|
| email: rgentlem@fhcrc.org
|
+---------------------------------------------------------------------
--
----------------+