Entering edit mode
I have 4 large tag datasets A1, A2 and B1, B2. The purpose of the
experiment was to determine differences in gene expression between A
and B.
A1 and B1 were done together as batch 1, and A2 and B2 were done
together as batch 2.
I several analyses and am completely puzzled.
First I ran sage.test (Fisher's exact test) on A1, B1 and on A2,
B2. The results were strongly concordant in that there was a lot of
overlap in the significant gene list,
and the same genes were up/down regulated (on the whole).
Then I ran edgeR on all 4 samples. A large number of genes were
declared significantly differentially expressed, but it was almost
completely disjoint from the genes "found" by sage.test. (Fewer than
10 out of 4000). The $r$ values were strongly clustered around 2,
although some were huge. Incidentally, the "exact" component of the
output does not seem to be described in ?edgeR, but I understand it
to be the p-value from the test.
Then I tested for batch effects by using sage.test on A1, A2 and on
B1, B2 and finally on A1 U B1 and A2 U B2. A fairly large number of
genes showed strong batch effects. These overlapped more with the
genotype within batch sage.test results than with the edgeR results.
Just to make things more confusing, the grad student who ran the
samples used the normal approximation to the Poisson to test genotype
effects within batch. These
were highly concordant between batches as well, but did not match the
sage.test results. I thought the p-values would be similar at least
for genes with large counts, but they were not.
I am inclined to go with combining the sage.test results, but any
advice would be very welcome
Thanks,
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111