Question

Nimblegen arrays/Limma package:duplicate correlation and other problems

0

Entering edit mode

r.athanasiadou ▴ 100

@rathanasiadou-2085

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070326/ d95a6cb9/attachment.pl

• 1.5k views

ADD COMMENT • link updated 18.1 years ago by Jenny Drnevich ★ 2.2k • written 18.1 years ago by r.athanasiadou ▴ 100

score 0 · Answer 1 · 2007-03-23

0

Entering edit mode

r.athanasiadou ▴ 100

@rathanasiadou-2085

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070323/ 85af2e30/attachment.pl

ADD COMMENT • link 18.1 years ago r.athanasiadou ▴ 100

score 0 · Answer 2 · 2007-03-26

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070326/ a773cf42/attachment.pl

ADD COMMENT • link 18.1 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070327/ b6ecda8c/attachment.pl

ADD REPLY • link 18.1 years ago r.athanasiadou ▴ 100

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070327/ 74345121/attachment.pl

ADD REPLY • link 18.1 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

On Tuesday 27 March 2007 11:13, Jenny Drnevich wrote: > Hi Niki, > > I guess it's not exactly clear in the help page for lmFit, but > correlation is only used if ndups > 1 or block is not NULL. Since you > shouldn't be using either of these, correlation won't be used and > hence the default value doesn't matter. This is all explained better > in the vignette, under Technical Replication. And just to be *absolutely* clear, you do not have replication on the array--is that correct? It looked from the "genes" slot that there might be replication, but it wasn't possible to tell. If not, is there a "tiling" component to the array, such as tiling of the promoter regions? Sean

ADD REPLY • link 18.1 years ago Sean Davis 21k

0

Entering edit mode

To add to Sean's keen observation... Are these expression arrays or tiling arrays? I've never worked with Nimblegen arrays, but ~390,000 spots seems like a lot for expression arrays. Also, you said this is chIP-on-chip data, right? If they are expression arrays with multiple spots, then you will need to use duplicateCorrelation to estimate the spot-replicate correlations. If they are tiling arrays (usually used with chIP-on-chip data), you will need to analyze it a completely different way because probes that are close to each other on the chromosome will not have independent fluorescence values. Jenny At 11:00 AM 3/27/2007, Sean Davis wrote: >On Tuesday 27 March 2007 11:13, Jenny Drnevich wrote: > > Hi Niki, > > > > I guess it's not exactly clear in the help page for lmFit, but > > correlation is only used if ndups > 1 or block is not NULL. Since you > > shouldn't be using either of these, correlation won't be used and > > hence the default value doesn't matter. This is all explained better > > in the vignette, under Technical Replication. > >And just to be *absolutely* clear, you do not have replication on the >array--is that correct? It looked from the "genes" slot that there might be >replication, but it wasn't possible to tell. If not, is there a "tiling" >component to the array, such as tiling of the promoter regions? > >Sean Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.1 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Yes it is a tiling array with no duplicate spots. I have looked into the packages that deal with tiling arrays like "chIPchip"(is it available for windows yet?) but from what I could gather from the way such packages work, they rely on having random fractionation of the genome ie sonication. Unfortunately, my experiment required restriction endonuclease digestion to fractionate the genome (produces specific and predictable short fragments and ideally -no sequence bias- I expect a sharp rise and fall of a positive region rather than a normal distribution of the M-values) and I don't thing that the common algorithms to summarize the probe-data are applicable in my case. I am thinking to rely on how many probes (out of the total number of probes that should hybridize to each generated genomic fragment) give reproducible and comparable results to summarize my probe-level data. Niki -----Original Message----- From: Jenny Drnevich [mailto:drnevich@uiuc.edu] Sent: 27 March 2007 17:23 To: Sean Davis; bioconductor at stat.math.ethz.ch; r.athanasiadou Subject: Re: [BioC] Nimblegen arrays/Limma package:duplicate correlation and other problems To add to Sean's keen observation... Are these expression arrays or tiling arrays? I've never worked with Nimblegen arrays, but ~390,000 spots seems like a lot for expression arrays. Also, you said this is chIP-on-chip data, right? If they are expression arrays with multiple spots, then you will need to use duplicateCorrelation to estimate the spot-replicate correlations. If they are tiling arrays (usually used with chIP-on-chip data), you will need to analyze it a completely different way because probes that are close to each other on the chromosome will not have independent fluorescence values. Jenny At 11:00 AM 3/27/2007, Sean Davis wrote: >On Tuesday 27 March 2007 11:13, Jenny Drnevich wrote: > > Hi Niki, > > > > I guess it's not exactly clear in the help page for lmFit, but > > correlation is only used if ndups > 1 or block is not NULL. Since you > > shouldn't be using either of these, correlation won't be used and > > hence the default value doesn't matter. This is all explained better > > in the vignette, under Technical Replication. > >And just to be *absolutely* clear, you do not have replication on the >array--is that correct? It looked from the "genes" slot that there might be >replication, but it wasn't possible to tell. If not, is there a "tiling" >component to the array, such as tiling of the promoter regions? > >Sean Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.1 years ago r.athanasiadou ▴ 100

0

Entering edit mode

On Tuesday 27 March 2007 14:11, r.athanasiadou wrote: > Yes it is a tiling array with no duplicate spots. > I have looked into the packages that deal with tiling arrays like > "chIPchip"(is it available for windows yet?) but from what I could gather > from the way such packages work, they rely on having random fractionation > of the genome ie sonication. Unfortunately, my experiment required > restriction endonuclease digestion to fractionate the genome (produces > specific and predictable short fragments and ideally -no sequence bias- I > expect a sharp rise and fall of a positive region rather than a normal > distribution of the M-values) and I don't thing that the common algorithms > to summarize the probe-data are applicable in my case. > > I am thinking to rely on how many probes (out of the total number of probes > that should hybridize to each generated genomic fragment) give reproducible > and comparable results to summarize my probe-level data. Niki, The package, ACME (previously known as chIPchip--thank one of my collaborators for the interesting name) is available as a Bioconductor package in the developer section. It does not rely on the specifics of R-devel, so it should run just fine on post-2.4 versions of R (and probably even earlier). It doesn't make particular assumptions about the distribution of the M-values except that more than expected by chance are above a threshold within a window of user-specified size (which I would think you could choose as your mean fragment size, or slightly larger so that you have about 10-12 probes in most windows). ACME is very insensitive to the actual distribution (data need not be centered, or even log-transformed) In your case, since you are also interested in differenced between two conditions, you could try simply subtracting the values for condition1 from condition2 and running ACME on the results. Doing so will likely be noisier than a single array, but you could certainly hope to get something useful. Sean

ADD REPLY • link 18.1 years ago Sean Davis 21k

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070327/ 3f18170e/attachment.pl

ADD REPLY • link 18.1 years ago r.athanasiadou ▴ 100

0

Entering edit mode

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070327/ dcc3d68c/attachment.pl

ADD REPLY • link 18.1 years ago Jenny Drnevich ★ 2.2k