filter high-throughput microarray data with noise

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 10.6 years ago

Dear Listers: Currently I am doing a research using a microarray data. I have two questions and hope I can get some help from here: 1. I have a dataset like the following, in which V1 is geneid, v3...are the fold changes of expression levels for different patients. There are multiple probes for one gene, so there are multiple rows. You can see from column V11 and V13, the fold changes are very different. Is it very common in microarray data analysis? Generally how to deal with that? I don't want to use a p-value or something like threshold to discretize them in this step yet. V1 V3 V5 V7 V9 V11 V13 -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481 1.870871 -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125 -3.201117 -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421 -39.554974 here is the variance > x2.var[2,] Group.1 V3 V5 V7 V9 V11 V13 -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714 2. Is there any good reference on this kind of things? like online materials or book. thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

Microarray Microarray • 1.1k views

ADD COMMENT • link updated 18.6 years ago by J.delasHeras@ed.ac.uk ★ 1.9k • written 18.6 years ago by Weiwei Shi ★ 1.2k

0

Entering edit mode

Francois Pepin ★ 1.3k

@francois-pepin-1012

Last seen 10.6 years ago

Hi Weiwei, I've removed the R-Help mailing, this question does not really concern them (except for the subset who's already on the bioc list). To answer your first question, it is somewhat common yes. The first step would be to ask yourself why you would be getting different values here. Could it be that some of the probes are not behaving properly in your samples? If you have reasons to think that there is one probe which is more representative, then you might want to only select that one (for example by variance). If they represented different splice variants, then you might want to keep all of them around. If you have such diverging results, I do not think that averaging them would be a good idea. The strategy that we used at the beginning was to keep all probes, and see which ones come up during differential expression or other analyses. Then you can compare the results to see how the different probes are reacting and which ones make sense based on what you know of your samples. In our case, we have good reasons to think that lots of probes are misbehaving, for example by looking at genes whose behaviors is known. We often select the most variables as the representative one. I do not have any references handy for this, maybe other people do. Francois On Mon, 2006-09-11 at 12:11 -0400, Weiwei Shi wrote: > Dear Listers: > > Currently I am doing a research using a microarray data. I have two > questions and hope I can get some help from here: > > 1. I have a dataset like the following, in which V1 is geneid, > v3...are the fold changes of expression levels for different patients. > There are multiple probes for one gene, so there are multiple rows. > You can see from column V11 and V13, the fold changes are very > different. Is it very common in microarray data analysis? Generally > how to deal with that? I don't want to use a p-value or something like > threshold to discretize them in this step yet. > > V1 V3 V5 V7 V9 > V11 V13 > -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481 1.870871 > -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125 -3.201117 > -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421 -39.554974 > > here is the variance > > x2.var[2,] > Group.1 V3 V5 V7 V9 V11 V13 > -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714 > > 2. Is there any good reference on this kind of things? like online > materials or book. > > thanks,

ADD COMMENT • link 18.6 years ago Francois Pepin ★ 1.3k

0

Entering edit mode

Dear Francois and others: Thank you and I cc to r-help since I just tried to get more suggestions. But I think keeping it at Bioconduct is totally fine with me. I am trying my idea on some pathway analysis and the data used here is a real medical data for a disease with unclear mechanism. The probes here are different-splices for one gene so I need to keep all of them for my analysis. Currently I do not have knowledge to evaluate the behaviors of the probes. By "We often select the most variables as the representative one.", do you mean "select the most samples or most probes"? I agreed with you that using an average is not a good idea. That's why I need some filtering mechanism or something else. I believe it is a common situation people meet with when they deal with high-throughput data with large noises. So my second question is looking for some general reference or experience. Thanks for other suggestions, On 9/11/06, Francois Pepin <fpepin at="" cs.mcgill.ca=""> wrote: > Hi Weiwei, > > I've removed the R-Help mailing, this question does not really concern > them (except for the subset who's already on the bioc list). > > To answer your first question, it is somewhat common yes. The first step > would be to ask yourself why you would be getting different values here. > Could it be that some of the probes are not behaving properly in your > samples? If you have reasons to think that there is one probe which is > more representative, then you might want to only select that one (for > example by variance). If they represented different splice variants, > then you might want to keep all of them around. If you have such > diverging results, I do not think that averaging them would be a good > idea. > > The strategy that we used at the beginning was to keep all probes, and > see which ones come up during differential expression or other analyses. > Then you can compare the results to see how the different probes are > reacting and which ones make sense based on what you know of your > samples. > > In our case, we have good reasons to think that lots of probes are > misbehaving, for example by looking at genes whose behaviors is known. > We often select the most variables as the representative one. > > I do not have any references handy for this, maybe other people do. > > Francois > > On Mon, 2006-09-11 at 12:11 -0400, Weiwei Shi wrote: > > Dear Listers: > > > > Currently I am doing a research using a microarray data. I have two > > questions and hope I can get some help from here: > > > > 1. I have a dataset like the following, in which V1 is geneid, > > v3...are the fold changes of expression levels for different patients. > > There are multiple probes for one gene, so there are multiple rows. > > You can see from column V11 and V13, the fold changes are very > > different. Is it very common in microarray data analysis? Generally > > how to deal with that? I don't want to use a p-value or something like > > threshold to discretize them in this step yet. > > > > V1 V3 V5 V7 V9 > > V11 V13 > > -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481 1.870871 > > -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125 -3.201117 > > -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421 -39.554974 > > > > here is the variance > > > x2.var[2,] > > Group.1 V3 V5 V7 V9 V11 V13 > > -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714 > > > > 2. Is there any good reference on this kind of things? like online > > materials or book. > > > > thanks, > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD REPLY • link 18.6 years ago Weiwei Shi ★ 1.2k

0

Entering edit mode

J.delasHeras@ed.ac.uk ★ 1.9k

@jdelasherasedacuk-1189

Last seen 9.6 years ago

United Kingdom

Quoting Weiwei Shi <helprhelp at="" gmail.com="">: > Dear Listers: > > Currently I am doing a research using a microarray data. I have two > questions and hope I can get some help from here: > > 1. I have a dataset like the following, in which V1 is geneid, > v3...are the fold changes of expression levels for different patients. > There are multiple probes for one gene, so there are multiple rows. > You can see from column V11 and V13, the fold changes are very > different. Is it very common in microarray data analysis? Generally > how to deal with that? I don't want to use a p-value or something like > threshold to discretize them in this step yet. > > V1 V3 V5 V7 V9 > V11 V13 > -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481 1.870871 > -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125 -3.201117 > -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421 -39.554974 > > here is the variance >> x2.var[2,] > Group.1 V3 V5 V7 V9 V11 V13 > -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714 > > 2. Is there any good reference on this kind of things? like online > materials or book. > > thanks, > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III You can have big variability for low intensity spots. If you have a gene that becomes either silenced or activated, you can get big fold change differences. I am sure there are other possibilities, but I think you should consider these too. Jose -- Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374 Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360 Swann Building, Mayfield Road University of Edinburgh Edinburgh EH9 3JR UK

ADD COMMENT • link 18.6 years ago J.delasHeras@ed.ac.uk ★ 1.9k

0

Entering edit mode

Hi all! I am using the heatmap method to generate a heatmap of gene expression values but I am having problems with some graphical parameters. I would like to change the size of the title since now it is too large to fit into the plotting area. I tried to change cex.main from within the heatmap call but nothing happened and I also tried to change the size by calling par after the heatmap call but that didn't work either. Probably there is an easy solution to this but I can't really figure it out. I would really appreciate some help! Best regards, Lina Hultin Rosenberg Part of the code generating the heatmap ====================================================================== === jpeg(filename=file.name,width=1000,height=600); heatmap(t(exprs(eset.filtered)),scale="column",labRow=samplenames.shor t,main =string.main.hc,col=greenred(80),cex.main=0.8); #par(cex.main=0.8); dev.off();

ADD REPLY • link 18.6 years ago Lina Hultin-Rosenberg ▴ 180

0

Entering edit mode

Hello. I think if you set the par(cex.main=.8) prior to the heatmap call, the size of the title should change to the value selected in the par command. Regards Marcus Marcus Gry Bj?rklund Royal Institute of Technology AlbaNova University Center Department of Molecular Biotechnology 106 91 Stockholm, Sweden www.arrayadvice.se Phone (office): +46 8 553 783 44 Fax: + 46 8 553 784 81 Visiting address: Roslagstullsbacken 21, Floor 3 Delivery address: Roslagsv?gen 30B Web: http://www.biotech.kth.se/molbio/microarray/index.html -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Lina Hultin-Rosenberg Sent: Tuesday, September 12, 2006 10:25 AM To: bioconductor at stat.math.ethz.ch Subject: [BioC] heatmap - changing title size Hi all! I am using the heatmap method to generate a heatmap of gene expression values but I am having problems with some graphical parameters. I would like to change the size of the title since now it is too large to fit into the plotting area. I tried to change cex.main from within the heatmap call but nothing happened and I also tried to change the size by calling par after the heatmap call but that didn't work either. Probably there is an easy solution to this but I can't really figure it out. I would really appreciate some help! Best regards, Lina Hultin Rosenberg Part of the code generating the heatmap ====================================================================== === jpeg(filename=file.name,width=1000,height=600); heatmap(t(exprs(eset.filtered)),scale="column",labRow=samplenames.shor t,main =string.main.hc,col=greenred(80),cex.main=0.8); #par(cex.main=0.8); dev.off(); _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 18.6 years ago Marcus Gry Bj�rklund ▴ 30

Login before adding your answer.