Entering edit mode
Al Ivens
▴
270
@al-ivens-1646
Last seen 10.2 years ago
Hi,
I have a whole pile of 2-colour arrays done with Cy3 genomic DNA, and
Cy5 cDNA. There are 7 "test" samples, each done in triplicate against
the genomic DNA (so 21 slides). I have seen a previous posting from
Jenny about how to analyse these kinds of hybes, and followed her
suggestions.
The scans are unprocessed BlueFuse, but bg has already been
subtracted.
> red_green <-
read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="A
MP
CH1",
+ weights="CONFIDENCE",quality="QUALITY"))
Although boxplots of the various slides for the two channels indicate
a
pretty reasonable degree of similarity within a channel, the two
channels as a whole were quite difft:
> summary(as.vector(log2(red_green$R)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.157 7.381 8.700 9.016 10.600 16.420
> summary(as.vector(log2(red_green$G)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.820 7.666 10.380 10.020 12.470 16.370
Despite this, I proceeded anyway:
> ## as Cy3 is gDNA in the ref channel, don't do NWA
> NBA <- normalizeBetweenArrays(red_green,method="Gquantile")
>
> ## get back the R and G values after Gq normalisation
> NBA.RG.MA <- RG.MA(NBA)
All Cy3 distributions were now identical, as expected, and the Cy5
channel had also been modified:
> summaryas.vectorlog2NBA.RG.MA$G)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.038 7.580 10.400 10.020 12.440 15.600
> summaryas.vectorlog2NBA.RG.MA$R)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8795 7.3940 8.7480 9.0160 10.5700 17.4500
> ## create a MAList to manipulate
> NBA.fake <- NBA
>
> ## replace the M values with the log2(R) values
> NBA.fake$M <- log2NBA.RG.MA$R)
I then did fitting, using a design matrix of the ref="gDNA" kind, with
a
contrast matrix for the cDNAx vs cDNAy comparisons
> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls")
> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts)
> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit)
I then looked at the coefficients of the eBayesed contrast fit object:
> for(i in 1:length(colnames(contrasts)))
+ {
+ cat("Contrast",i," ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n")
+ }
## individual cDNAs (cDNA1, 2 etc)
min 25% median mean 75% max
Contrast 1 5.417 8.538 9.79 9.873 10.99 16.23
Contrast 2 4.426 6.867 8.623 8.798 10.46 15.58
Contrast 3 4.623 7.26 8.606 8.951 10.37 15.92
Contrast 4 5.175 7.626 8.853 9.323 10.72 16.52
Contrast 5 3.928 7.175 8.864 9.073 10.71 15.63
Contrast 6 4.167 7.269 8.264 8.848 10.29 16.34
Contrast 7 3.948 6.381 7.956 8.244 9.824 15.58
## the various comparisons (cDNA2-cDNA1 etc)
Contrast 8 -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715
Contrast 9 -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098
Contrast 10 -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093
Contrast 11 -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631
Contrast 12 -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511
Contrast 13 -6.746 -2.3 -1.536 -1.629 -0.9041 4.385
Contrast 14 -5.327 -0.209 0.1646 0.1526 0.5908 5.481
Contrast 15 -2.688 0.1018 0.4627 0.5249 0.9155 6.033
Contrast 16 -3.462 -0.1604 0.2428 0.275 0.7153 6.032
.......
Many of the medians are well below zero, and of course, the output
from
topTable is almost exclusively negative logFC. This doesnt make sense
biologically. Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA
reference"
biologically speaking (cDNA2-7 are all single gene mutants).
"Unfortunately", the median value for the cDNA1 coeff is quite a bit
larger than the others, so I think this is skewing most of the
contrasts
(i.e. I dont think there is a wholesale reduction of transcription in
the other mutants). What could/should I do about it? I have
contemplated normalizeQuantiles of the red channel after the Gquantile
has been done (i.e. before fitting), but not sure whether that is a
valid thing to do. Would a single channel analysis be more
appropriate
here?
Thoughts/suggestions greatly appreciated.
Cheers,
al
> sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils"
[6] "datasets" "methods" "base"
other attached packages:
geneplotter annotate Biobase gplots gdata
gtools
"1.14.0" "1.14.1" "1.14.1" "2.3.2" "2.3.1"
"2.3.1"
lattice MASS statmod sma limma
Hmisc
"0.16-2" "7.2-35" "1.3.0" "0.5.15" "2.10.0"
"3.4-2"
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.