Entering edit mode
On Thu, Oct 21, 2010 at 3:15 AM, Jarek Bryk <bryk@evolbio.mpg.de>
wrote:
> Hi Sean,
>
> and thanks a lot for the answer. I still need some clarification
though.
>
> On 20 Oct 2010, at 18:31, Sean Davis wrote:
> > You definitely DO NOT want to be loess-normalizing CGH data.
There is an
> expected positive correlation between average intensity and log-
ratio;
> loess-normalization will effectively eliminate that signal.
>
> Oops. Thanks. I got that idea from limma user guide (p. 24) - I
assumed two
> color normalization should be similar regardless of whether one
measures
> gene expression or CGH - is this where the trick is? How would you
recommend
> normalizing the data? Just to add some information, my custom design
is
> concentrated on some 1000 genic regions (genes+10kb up and
downstream)
> selected based on our previous experiment. These regions have very
high
> probe density (essentially like a tiling array - my average genomic
probe
> density is 170bp) Some of these regions may be copy-number-variable,
but
> some (majority probably) of them should not be. I have a standard
set of
> Agilent's 11k normalizing probes plus 5k replicated probes on the
arrays as
> well, but most of the genome is not covered with probes at all.
>
>
You will need to look at your data to determine what normalization
will
work. In general and for future reference, when making custom designs
like
this, we always use a backbone of 20-30k genomic probes so that we can
see a
baseline. The main two issues to be concerned with are "centering"
the data
and removal of "wave" artifacts. There is a fair amount of literature
on
the latter and a bit on the former, but median-centering the data will
likely not be too far off as a start. The latter may be difficult to
apply
to your data given the gene-centric nature of your design.
> > This information is stored in the MA$Genes$SystematicName above.
You
> will need to split out the pieces (chromosome, position) and then
use those
> for the locations. You will also probably need to remove control
probes.
>
> OK, I got the MA$genes:
>
> > head(MA$genes)
> Row Col Start
Sequence
> ProbeUID ControlType
> 1 1 1 0
> 0 1
> 2 1 2 0
> 1 1
> 3 1 3 0
> 1 1
> 4 1 4 0
> CCCATTCACTAATCTACATATTATCTCCATCCAACAAAAATTTCTTTCAGTAAGGTGTGG
3
> 0
> 5 1 5 0
> AAACAATCAGGTTTTCACTCTGACAGCCCAATAATGCATTTAGTTTGAAATTACACTTGG
5
> 0
> 6 1 6 0
> CTATTCTGTATCATATAGGGAGGATGCTGTCCTGGAATATCTCAAGATTGCTCAAGAC 7
> 0
> ProbeName GeneName SystematicName
> 1 MmCGHBrightCorner MmCGHBrightCorner MmCGHBrightCorner
> 2 DarkCorner DarkCorner DarkCorner
> 3 DarkCorner DarkCorner DarkCorner
> 4 A_67_P21917383 Msn chrX:93323379-93323438
> 5 A_67_P17120964 chr11:18773745-18773804 chr11:18773745-18773804
> 6 A_67_P21917619 Msn chrX:93350785-93350842
>
> If I understand you correctly, I'd have to divide the content of
> MA$genes$SystematicName so that it becomes something like:
>
> >test<-MA$genes[10:20,] # to avoid control probes in this example
> >head(test)
> Row Col Start
> Sequence ProbeUID ControlType
> 10 1 10 0
> GTTTTAAACCATCTCTGAGGTTAATGGATGGCTTTCTGGGCCAGCAG 15
0
> 11 1 11 0
> ACACAAAAGCAATTCTGGCTAGTCAAGACTTGAAAGAAACATATAAACGCACACACATTG
17
> 0
> 12 1 12 0
> GCTTCTGTGCACTCTTATTGGAAACAAGTTCTTCAGAAGAACAAATGCAGGATAAAACTT
19
> 0
> 13 1 13 0
> AAACATGAATACAGGACGGGGATGACAAATCCACGAGAAACAAAGTTAATTTCACTTC 21
> 0
> 14 1 14 0
> AACAACTATCGAGGTCTAATGAAAAGAGCATGCATTTGTCAAGGAACAGCAGGAAA 23
> 0
> 15 1 15 0
> CTCCCTCATTAGCCCATGTCGTGGAGGGTTGGCTGTGAGAAAAGT 25 0
> ProbeName GeneName SystematicName
> 10 A_67_P11007835 Gm13547 chr2:29617549-29617595
> 11 A_67_P15475991 chr8:49310974-49311033 chr8:49310974-49311033
> 12 A_67_P08003555 Arl15 chr13:114808546-114808605
> 13 A_67_P01452405 Grin2b chr6:136111490-136111547
> 14 A_67_P06614048 AK042807 chr8:102166914-102166969
> 15 A_67_P21149931 Tshz1 chr18:84192155-84192199
>
>
> >test$Chr<-sapply(test$SystematicName,function(x){strsplit(x,":")[[1
]][[1]]}
> >test$Position<-sapply(test$SystematicName,function(x){strsplit(x,":
")[[1]][[2]]}
> # there's probably an easier way of doing this...
> >head(test) # but it gets the job done
> Row Col Start
> Sequence ProbeUID ControlType
> 10 1 10 0
> GTTTTAAACCATCTCTGAGGTTAATGGATGGCTTTCTGGGCCAGCAG 15
0
> 11 1 11 0
> ACACAAAAGCAATTCTGGCTAGTCAAGACTTGAAAGAAACATATAAACGCACACACATTG
17
> 0
> 12 1 12 0
> GCTTCTGTGCACTCTTATTGGAAACAAGTTCTTCAGAAGAACAAATGCAGGATAAAACTT
19
> 0
> 13 1 13 0
> AAACATGAATACAGGACGGGGATGACAAATCCACGAGAAACAAAGTTAATTTCACTTC 21
> 0
> 14 1 14 0
> AACAACTATCGAGGTCTAATGAAAAGAGCATGCATTTGTCAAGGAACAGCAGGAAA 23
> 0
> 15 1 15 0
> CTCCCTCATTAGCCCATGTCGTGGAGGGTTGGCTGTGAGAAAAGT 25 0
> ProbeName GeneName SystematicName
Chr
> Position
> 10 A_67_P11007835 Gm13547 chr2:29617549-29617595
chr2
> 29617549-29617595
> 11 A_67_P15475991 chr8:49310974-49311033 chr8:49310974-49311033
chr8
> 49310974-49311033
> 12 A_67_P08003555 Arl15 chr13:114808546-114808605
chr13
> 114808546-114808605
> 13 A_67_P01452405 Grin2b chr6:136111490-136111547
chr6
> 136111490-136111547
> 14 A_67_P06614048 AK042807 chr8:102166914-102166969
chr8
> 102166914-102166969
> 15 A_67_P21149931 Tshz1 chr18:84192155-84192199
chr18
> 84192155-84192199
>
>
That is most of the way there. The Position column needs to be
numeric, so
you'll want to do another split to get that down to a single number, I
think.
Sean
> > Hope that helps.
>
> Thanks a lot!
> cheers
> jarek
>
> > > sessionInfo()
> > R version 2.12.0 (2010-10-15)
> > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] splines stats graphics grDevices utils datasets
methods
> base
> >
> > other attached packages:
> > [1] snapCGH_1.20.0 DNAcopy_1.24.0 limma_3.6.0 aCGH_1.28.0
> multtest_2.6.0 Biobase_2.10.0
> > [7] survival_2.35-8 cluster_1.13.1 GLAD_2.12.0
> >
> > loaded via a namespace (and not attached):
> > [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0
> AnnotationDbi_1.12.0
> > [5] DBI_0.2-5 genefilter_1.32.0 grid_2.12.0
> lattice_0.19-13
> > [9] MASS_7.3-8 preprocessCore_1.12.0
RColorBrewer_1.0-2
> RSQLite_0.9-2
> > [13] strucchange_1.4-2 tilingArray_1.28.0 tools_2.12.0
> vsn_3.18.0
> > [17] xtable_1.5-6
>
> --
> Jarek Bryk | www.evolbio.mpg.de/~bryk
> Max Planck Institute for Evolutionary Biology
> August Thienemann Str. 2 | 24306 Plön, Germany
> tel. +49 4522 763 287 | bryk@evolbio.mpg.de
>
>
>
[[alternative HTML version deleted]]