Analysing expression with tiling arrays

0

Entering edit mode

January Weiner ▴ 370

@january-weiner-3999

Last seen 10.6 years ago

Dear all, I have two tiling arrays of a bacterial genome. Unfortunately, I do not have the original files (like the bpmap / cel files for Affy tiling chips), just lists of spot intensities in two conditions for each probe (i.e. two values for each probe), and a list of gene positions on the genome. Several probes map on a each gene. The genome is not publicly available yet. What would be the best way to tackle this? I thought that I might just calculate the logFC for each probe, and then, for each gene, run a one sample t-test of the corresponding probe logFC values; then correct for multiple testing. Would that make sense? I looked up the approach described in Toedling and Huber in 2008 PLoC Comp Biol (doi:10.1371/journal.pcbi.1000227) but this is not exactly what I had in mind; rather than looking for enriched regions, I'm more interested in focusing on the genes directly -- as a bacterial genome is densely packed with probes and genes (I have 10-30 probes per gene). Best regards, January -- -------- Dr. January Weiner 3 --------------------------------------

probe probe • 1.4k views

ADD COMMENT • link updated 14.6 years ago by January Weiner ▴ 240 • written 14.6 years ago by January Weiner ▴ 370

0

Entering edit mode

January Weiner ▴ 240

@january-weiner-4252

Last seen 5.8 years ago

European Union

Dear all, I have two tiling arrays of a bacterial genome. Unfortunately, I do not have the original files (like the bpmap / cel files for Affy tiling chips), just lists of spot intensities in two conditions for each probe (i.e. two values for each probe), and a list of gene positions on the genome. Several probes map on a each gene. The genome is not publicly available yet. What would be the best way to tackle this? I thought that I might just calculate the logFC for each probe, and then, for each gene, run a one sample t-test of the corresponding probe logFC values; then correct for multiple testing. Would that make sense? I looked up the approach described in Toedling and Huber in 2008 PLoC Comp Biol (doi:10.1371/journal.pcbi.1000227) but this is not exactly what I had in mind; rather than looking for enriched regions, I'm more interested in focusing on the genes directly -- as a bacterial genome is densely packed with probes and genes (I have 10-30 probes per gene). Best regards, January -- -------- Dr. January Weiner 3 --------------------------------------

ADD COMMENT • link 14.6 years ago January Weiner ▴ 240

0

Entering edit mode

January, On Sep/10/10 9:21 AM, January Weiner wrote: > Dear all, > > I have two tiling arrays of a bacterial genome. Unfortunately, I do > not have the original files (like the bpmap / cel files for Affy > tiling chips), just lists of spot intensities in two conditions for > each probe (i.e. two values for each probe), and a list of gene > positions on the genome. Several probes map on a each gene. The genome > is not publicly available yet. > > What would be the best way to tackle this? I thought that I might just > calculate the logFC for each probe, and then, for each gene, run a one > sample t-test of the corresponding probe logFC values; then correct > for multiple testing. this sounds reasonable, just be aware that the noise in the data from neighbouring probes is likely correlated, so that the t-distribution with the 'naive' degrees of freedom will give you optimistic (too small) p-values. You can still use them for ranking / prioritizing genes, and perhaps set the cutoff from known positive and negative control genes. > > Would that make sense? I looked up the approach described in Toedling > and Huber in 2008 PLoC Comp Biol (doi:10.1371/journal.pcbi.1000227) > but this is not exactly what I had in mind; rather than looking for > enriched regions, I'm more interested in focusing on the genes > directly -- as a bacterial genome is densely packed with probes and > genes (I have 10-30 probes per gene). > > Best regards, > > January > -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 14.6 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

> this sounds reasonable, just be aware that the noise in the data from > neighbouring probes is likely correlated, so that the t-distribution with > the 'naive' degrees of freedom will give you optimistic (too small) > p-values. You can still use them for ranking / prioritizing genes, and > perhaps set the cutoff from known positive and negative control genes. Thanks for the answer, Wolfgang. I did the simple / naive t-test, and it still gave "reasonable" results (i.e. same as with a different approach). Regards, j. -- -------- Dr. January Weiner 3 -------------------------------------- Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin, Germany Web?? : www.mpiib-berlin.mpg.de Tel? ?? : +49-30-28460514 I di

ADD REPLY • link 14.6 years ago January Weiner ▴ 370

0

Entering edit mode

Edwin Groot ▴ 230

@edwin-groot-3606

Last seen 10.6 years ago

On Fri, 10 Sep 2010 09:30:11 +0200 January Weiner <january.weiner at="" mpiib-berlin.mpg.de=""> wrote: > Dear all, > > I have two tiling arrays of a bacterial genome. Unfortunately, I do > not have the original files (like the bpmap / cel files for Affy > tiling chips), just lists of spot intensities in two conditions for > each probe (i.e. two values for each probe), and a list of gene > positions on the genome. Several probes map on a each gene. The > genome > is not publicly available yet. > Hello January, If the tiling array is from Affymetrix, the bpmap files exist. To start with you should track them down because they give the necessary annotation and position information. I am assuming you want to measure RNA translation using this tiling array platform. That should be a fairly trivial analysis once you get the data into an Expression Set object. Is the data from GEO??? Edwin > What would be the best way to tackle this? I thought that I might > just > calculate the logFC for each probe, and then, for each gene, run a > one > sample t-test of the corresponding probe logFC values; then correct > for multiple testing. > > Would that make sense? I looked up the approach described in Toedling > and Huber in 2008 PLoC Comp Biol (doi:10.1371/journal.pcbi.1000227) > but this is not exactly what I had in mind; rather than looking for > enriched regions, I'm more interested in focusing on the genes > directly -- as a bacterial genome is densely packed with probes and > genes (I have 10-30 probes per gene). > > Best regards, > > January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945

ADD COMMENT • link 14.6 years ago Edwin Groot ▴ 230

Login before adding your answer.