Entering edit mode
Seungwoo Hwang
▴
80
@seungwoo-hwang-2520
Last seen 10.2 years ago
Dear all,
I am struggling with dealing with an unusual cDNA microarray data
which were generated long time ago by a lab next door.
Their platform was 24K cDNA array. In order to enlarge genome
coverage, they prepared two probe libraries and printed them onto two
arrays. Let's call them array A and array B. Then they hybridized the
two arrays with the same samples.
First I mapped their GenBank probe IDs into UniGene IDs. By doing so,
it turned out that array series A has ~11,000 unique UniGene probe
IDs, array series B has ~10,000 unique UniGene probe IDs, and their
intersection is ~6,000. Thus, after normalizing array series A and B
separately, I have the following data;
Array A series
ID replicate1 replicate2 ... replicateN
A1 logRatioA1.1 logRatioA1.2 ... logRatioA1.N
A2 logRatioA2.1 logRatioA2.N ... logRatioA2.N
....
A11,000 ....
Array B series
ID replicate1 replicate2 ... replicateN
B1 logRatioB1.1 logRatioB1.2 ... logRatioB1.N
B2 logRatioB2.1 logRatioB2.2 ... logRatioB2.N
....
B10,000 ....
For probes that are present in only one of the two arrays, I think the
analysis is simple. I can just do the statistical test for the two
datasets separately and take those probes' results that are reported
by one of the two datasets.
For probes that are present in both arrays, I am not sure how to
proceed. From the two separate test results, one might report a probe
significant whereas the other might not.
So I came up with this idea. First I can paste the two log ratio
matrices together as follows;
ID replicate1 replicate2 ... replicateN
A1 logRatioA1.1 logRatioA1.2 ... logRatioA1.N
A2 logRatioA2.1 logRatioA2.N ... logRatioA2.N
....
A11,000 ....
B1 logRatioB1.1 logRatioB1.2 ... logRatioB1.N
B2 logRatioB2.1 logRatioB2.2 ... logRatioB2.N
....
B10,000 ....
Then, for an ID that occurs in both A and B, take the mean of two log
ratio values. For example, if A1 and B1 correspond to the same ID,
then its collapsed log ratio value will be (logRatioA1.1 +
logRatioB1.1)/2
The rationale for doing so is, since the two arrays were hybridized
with same samples and since they were normalized, log ratio values
between series A and B are comparable, meaning that log ratio values
can be averaged between series A and B just like we can do so for
duplicate probes within an array.
Is this approach valid enough? Or it is better to test the two
matrices separately and report two test results side by side?
Thanks a lot,
Seungwoo
------------------------------------
Seungwoo Hwang, Ph.D.
Senior Research Scientist
Korean Bioinformation Center