Dear Listers:
Currently I am doing a research using a microarray data. I have two
questions and hope I can get some help from here:
1. I have a dataset like the following, in which V1 is geneid,
v3...are the fold changes of expression levels for different patients.
There are multiple probes for one gene, so there are multiple rows.
You can see from column V11 and V13, the fold changes are very
different. Is it very common in microarray data analysis? Generally
how to deal with that? I don't want to use a p-value or something like
threshold to discretize them in this step yet.
V1 V3 V5 V7 V9
V11 V13
-2147022884 3.967828 5.010724 3.356568 1.227882 1.481481
1.870871
-2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125
-3.201117
-2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421
-39.554974
here is the variance
> x2.var[2,]
Group.1 V3 V5 V7 V9 V11 V13
-2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714
2. Is there any good reference on this kind of things? like online
materials or book.
thanks,
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
Hi Weiwei,
I've removed the R-Help mailing, this question does not really concern
them (except for the subset who's already on the bioc list).
To answer your first question, it is somewhat common yes. The first
step
would be to ask yourself why you would be getting different values
here.
Could it be that some of the probes are not behaving properly in your
samples? If you have reasons to think that there is one probe which is
more representative, then you might want to only select that one (for
example by variance). If they represented different splice variants,
then you might want to keep all of them around. If you have such
diverging results, I do not think that averaging them would be a good
idea.
The strategy that we used at the beginning was to keep all probes, and
see which ones come up during differential expression or other
analyses.
Then you can compare the results to see how the different probes are
reacting and which ones make sense based on what you know of your
samples.
In our case, we have good reasons to think that lots of probes are
misbehaving, for example by looking at genes whose behaviors is known.
We often select the most variables as the representative one.
I do not have any references handy for this, maybe other people do.
Francois
On Mon, 2006-09-11 at 12:11 -0400, Weiwei Shi wrote:
> Dear Listers:
>
> Currently I am doing a research using a microarray data. I have two
> questions and hope I can get some help from here:
>
> 1. I have a dataset like the following, in which V1 is geneid,
> v3...are the fold changes of expression levels for different
patients.
> There are multiple probes for one gene, so there are multiple rows.
> You can see from column V11 and V13, the fold changes are very
> different. Is it very common in microarray data analysis? Generally
> how to deal with that? I don't want to use a p-value or something
like
> threshold to discretize them in this step yet.
>
> V1 V3 V5 V7 V9
> V11 V13
> -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481
1.870871
> -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125
-3.201117
> -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421
-39.554974
>
> here is the variance
> > x2.var[2,]
> Group.1 V3 V5 V7 V9 V11 V13
> -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714
>
> 2. Is there any good reference on this kind of things? like online
> materials or book.
>
> thanks,
Dear Francois and others:
Thank you and I cc to r-help since I just tried to get more
suggestions. But I think keeping it at Bioconduct is totally fine with
me.
I am trying my idea on some pathway analysis and the data used here is
a real medical data for a disease with unclear mechanism. The probes
here are different-splices for one gene so I need to keep all of them
for my analysis. Currently I do not have knowledge to evaluate the
behaviors of the probes.
By "We often select the most variables as the representative one.", do
you mean "select the most samples or most probes"?
I agreed with you that using an average is not a good idea. That's why
I need some filtering mechanism or something else. I believe it is a
common situation people meet with when they deal with high-throughput
data with large noises. So my second question is looking for some
general reference or experience.
Thanks for other suggestions,
On 9/11/06, Francois Pepin <fpepin at="" cs.mcgill.ca=""> wrote:
> Hi Weiwei,
>
> I've removed the R-Help mailing, this question does not really
concern
> them (except for the subset who's already on the bioc list).
>
> To answer your first question, it is somewhat common yes. The first
step
> would be to ask yourself why you would be getting different values
here.
> Could it be that some of the probes are not behaving properly in
your
> samples? If you have reasons to think that there is one probe which
is
> more representative, then you might want to only select that one
(for
> example by variance). If they represented different splice variants,
> then you might want to keep all of them around. If you have such
> diverging results, I do not think that averaging them would be a
good
> idea.
>
> The strategy that we used at the beginning was to keep all probes,
and
> see which ones come up during differential expression or other
analyses.
> Then you can compare the results to see how the different probes are
> reacting and which ones make sense based on what you know of your
> samples.
>
> In our case, we have good reasons to think that lots of probes are
> misbehaving, for example by looking at genes whose behaviors is
known.
> We often select the most variables as the representative one.
>
> I do not have any references handy for this, maybe other people do.
>
> Francois
>
> On Mon, 2006-09-11 at 12:11 -0400, Weiwei Shi wrote:
> > Dear Listers:
> >
> > Currently I am doing a research using a microarray data. I have
two
> > questions and hope I can get some help from here:
> >
> > 1. I have a dataset like the following, in which V1 is geneid,
> > v3...are the fold changes of expression levels for different
patients.
> > There are multiple probes for one gene, so there are multiple
rows.
> > You can see from column V11 and V13, the fold changes are very
> > different. Is it very common in microarray data analysis?
Generally
> > how to deal with that? I don't want to use a p-value or something
like
> > threshold to discretize them in this step yet.
> >
> > V1 V3 V5 V7 V9
> > V11 V13
> > -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481
1.870871
> > -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125
-3.201117
> > -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421
-39.554974
> >
> > here is the variance
> > > x2.var[2,]
> > Group.1 V3 V5 V7 V9 V11
V13
> > -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714
> >
> > 2. Is there any good reference on this kind of things? like online
> > materials or book.
> >
> > thanks,
>
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
Quoting Weiwei Shi <helprhelp at="" gmail.com="">:
> Dear Listers:
>
> Currently I am doing a research using a microarray data. I have two
> questions and hope I can get some help from here:
>
> 1. I have a dataset like the following, in which V1 is geneid,
> v3...are the fold changes of expression levels for different
patients.
> There are multiple probes for one gene, so there are multiple rows.
> You can see from column V11 and V13, the fold changes are very
> different. Is it very common in microarray data analysis? Generally
> how to deal with that? I don't want to use a p-value or something
like
> threshold to discretize them in this step yet.
>
> V1 V3 V5 V7 V9
> V11 V13
> -2147022884 3.967828 5.010724 3.356568 1.227882 1.481481
1.870871
> -2147022884 -4.031250 -1.441341 -1.036145 -3.583333 -8.953125
-3.201117
> -2147022884 -2.016835 -1.568063 -1.079279 -1.288172 -50.875421
-39.554974
>
> here is the variance
>> x2.var[2,]
> Group.1 V3 V5 V7 V9 V11 V13
> -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714
>
> 2. Is there any good reference on this kind of things? like online
> materials or book.
>
> thanks,
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
You can have big variability for low intensity spots. If you have a
gene that becomes either silenced or activated, you can get big fold
change differences.
I am sure there are other possibilities, but I think you should
consider these too.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
Hi all!
I am using the heatmap method to generate a heatmap of gene expression
values but I am having problems with some graphical parameters. I
would like
to change the size of the title since now it is too large to fit into
the
plotting area. I tried to change cex.main from within the heatmap call
but
nothing happened and I also tried to change the size by calling par
after
the heatmap call but that didn't work either. Probably there is an
easy
solution to this but I can't really figure it out.
I would really appreciate some help!
Best regards,
Lina Hultin Rosenberg
Part of the code generating the heatmap
======================================================================
===
jpeg(filename=file.name,width=1000,height=600);
heatmap(t(exprs(eset.filtered)),scale="column",labRow=samplenames.shor
t,main
=string.main.hc,col=greenred(80),cex.main=0.8);
#par(cex.main=0.8);
dev.off();
Hello.
I think if you set the par(cex.main=.8) prior to the heatmap call, the
size
of the title should change to the value selected in the par command.
Regards
Marcus
Marcus Gry Bj?rklund
Royal Institute of Technology
AlbaNova University Center
Department of Molecular Biotechnology
106 91 Stockholm, Sweden
www.arrayadvice.se
Phone (office): +46 8 553 783 44
Fax: + 46 8 553 784 81
Visiting address: Roslagstullsbacken 21, Floor 3
Delivery address: Roslagsv?gen 30B
Web: http://www.biotech.kth.se/molbio/microarray/index.html
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Lina
Hultin-Rosenberg
Sent: Tuesday, September 12, 2006 10:25 AM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] heatmap - changing title size
Hi all!
I am using the heatmap method to generate a heatmap of gene expression
values but I am having problems with some graphical parameters. I
would like
to change the size of the title since now it is too large to fit into
the
plotting area. I tried to change cex.main from within the heatmap call
but
nothing happened and I also tried to change the size by calling par
after
the heatmap call but that didn't work either. Probably there is an
easy
solution to this but I can't really figure it out.
I would really appreciate some help!
Best regards,
Lina Hultin Rosenberg
Part of the code generating the heatmap
======================================================================
===
jpeg(filename=file.name,width=1000,height=600);
heatmap(t(exprs(eset.filtered)),scale="column",labRow=samplenames.shor
t,main
=string.main.hc,col=greenred(80),cex.main=0.8);
#par(cex.main=0.8);
dev.off();
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor