HI, my name is Gregory Miles. I'm at Boston University and was given
this
address by Dr. Carey (I went to a seminar of his last week) at the
Harvard
medical school and was told that I could ask my question about 2 color
data to
you. On the mouse microarray dataset we have, there are two colors,
and
therefore two values that can be below background. When both values
are above
background (zero_barcode on our chip), we keep the data and when both
are
below we eliminate the data (they become NA). I imagine this is a
correct
approach, but what should be done regarding the data that has one
intensity
below background and one above. Would it be best to keep the good
value? Do we
eliminate the entire gene from entry into bioconductor? Perhaps there
is a way
to specify to bioconductor that this is the case (by entering a
background
value) and allowing it to handle the data abstractly? Or is it best to
let
Bioconductor look at them as NA's. Any help would be greatly
appreciated.
Thanks!
-Greg Miles
I would not delete data that is below background, even in both
channels, if it is above background on at least one array.
It seems to me that it is important information to know that a gene
does not express under some condition in your experiment. Of course,
the unfortunate side-effect of our liking to use ratios is that
"zero" is not handled well. But a gene that expresses in some
conditions of interest but not in others surely is of primary
interest to your study.
--Naomi
At 11:48 AM 7/18/2006, milesg at bu.edu wrote:
>HI, my name is Gregory Miles. I'm at Boston University and was given
this
>address by Dr. Carey (I went to a seminar of his last week) at the
Harvard
>medical school and was told that I could ask my question about 2
>color data to
>you. On the mouse microarray dataset we have, there are two colors,
and
>therefore two values that can be below background. When both values
are above
>background (zero_barcode on our chip), we keep the data and when both
are
>below we eliminate the data (they become NA). I imagine this is a
correct
>approach, but what should be done regarding the data that has one
intensity
>below background and one above. Would it be best to keep the good
>value? Do we
>eliminate the entire gene from entry into bioconductor? Perhaps
>there is a way
>to specify to bioconductor that this is the case (by entering a
background
>value) and allowing it to handle the data abstractly? Or is it best
to let
>Bioconductor look at them as NA's. Any help would be greatly
appreciated.
>Thanks!
>-Greg Miles
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
To those who responded to my last e-mail, thanks for the help. I had
another
question. I got my 2 color time course data into limma. I have a
targets file
with 2 replicates per time point for time points 1 day, 2 day, 4 day,
7 day,
and 14 day. I have LIMMA assuming that these are not ALL replicates by
telling
it so. Please not that any semicolons coming up are NOT part of the
code.
There is supposed to be a nicely sized shift in differential
expression from 4
days to 7 days, so I used those points for my comparison. As the LIMMA
manual
has stated, I have assigned my levels variable lev, assigned my
factors
variable f, and my design. I made my colnames variable:
colnames(design)=lev ;
and my fit variable: fit=lmFit(MA, design); where MA is the normalized
RG. I
continue to follow the manual (the variable names it gave me were
X1day,
X2day, etc.): cont=makeContrasts("X7day-X4day", levels=design); I then
did
fit2=contrasts.fit(fit, cont) ; then fit2=eBayes(fit2); then I did
selected=p.adjust(fit2$F.p.value, method="BH")<0.05 to get the genes
that
change from 4 days to 7 days with strong p-values. Unfortunately,
looking at
the results yield only about 30 genes (there should be several
hundred), none
of whom (by eye) undergo any significant change in differential
expression
from the 4 day point to the 7 day point. Can someone please help me
with what
I may be doing wrong? Any help would be greatly appreciated. Thanks!
-greg
For this type of problem, it usually helps if you paste your code to
the end of the message.
--Naomi
At 02:16 PM 7/19/2006, milesg at bu.edu wrote:
>To those who responded to my last e-mail, thanks for the help. I had
another
>question. I got my 2 color time course data into limma. I have a
targets file
>with 2 replicates per time point for time points 1 day, 2 day, 4 day,
7 day,
>and 14 day. I have LIMMA assuming that these are not ALL replicates
>by telling
>it so. Please not that any semicolons coming up are NOT part of the
code.
>There is supposed to be a nicely sized shift in differential
>expression from 4
>days to 7 days, so I used those points for my comparison. As the
LIMMA manual
>has stated, I have assigned my levels variable lev, assigned my
factors
>variable f, and my design. I made my colnames variable:
>colnames(design)=lev ;
>and my fit variable: fit=lmFit(MA, design); where MA is the
normalized RG. I
>continue to follow the manual (the variable names it gave me were
X1day,
>X2day, etc.): cont=makeContrasts("X7day-X4day", levels=design); I
then did
>fit2=contrasts.fit(fit, cont) ; then fit2=eBayes(fit2); then I did
>selected=p.adjust(fit2$F.p.value, method="BH")<0.05 to get the genes
that
>change from 4 days to 7 days with strong p-values. Unfortunately,
looking at
>the results yield only about 30 genes (there should be several
hundred), none
>of whom (by eye) undergo any significant change in differential
expression
>from the 4 day point to the 7 day point. Can someone please help me
with what
>I may be doing wrong? Any help would be greatly appreciated. Thanks!
>-greg
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Hi Miles,
milesg at bu.edu wrote:
> HI, my name is Gregory Miles. I'm at Boston University and was given
this
> address by Dr. Carey (I went to a seminar of his last week) at the
Harvard
> medical school and was told that I could ask my question about 2
color data to
> you. On the mouse microarray dataset we have, there are two colors,
and
> therefore two values that can be below background. When both values
are above
> background (zero_barcode on our chip), we keep the data and when
both are
> below we eliminate the data (they become NA). I imagine this is a
correct
> approach, but what should be done regarding the data that has one
intensity
> below background and one above. Would it be best to keep the good
value? Do we
> eliminate the entire gene from entry into bioconductor? Perhaps
there is a way
> to specify to bioconductor that this is the case (by entering a
background
> value) and allowing it to handle the data abstractly? Or is it best
to let
> Bioconductor look at them as NA's. Any help would be greatly
appreciated.
Probably the easiest way to handle such things is to use the limma
package and when you do background correction, use the 'normexp'
method,
which ensures that none of the background corrected values will be
below
zero.
This is probably not critical for those genes that are both below zero
(since you probably want to ignore those anyway), but you certainly
wouldn't want to ignore a gene where one sample is below zero and the
other is (possibly) a large value.
If you want to use limma, I would strongly suggest perusing the user's
guide. The learning curve can be steep, especially if you don't have a
statistical background (load limma, then type limmaUsersGuide() at the
R
prompt).
HTH,
Jim
> Thanks!
> -Greg Miles
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.
I would not do this. Use less background correction, (e.g. don't
background correct, or subtract 1/2 of the background), or set the
channels that are below background to some low value (e.g. 1) so that
logs can be used.
--Naomi
At 09:48 AM 7/19/2006, you wrote:
>Thanks for your quick response. I will not delete the gene completely
(if you
>delete genes then LIMMA doesn't know how to handle genes lists with
different
>orders), but although it is helpful to keep genes that may have
>information in
>one array, I do think it may be necessary to "NA" the below
background values
>and keep the above background ones. Thus you still have the good
values but
>have eliminated possible bad ones. What do you think of this?
>-greg
>
>Quoting Naomi Altman <naomi at="" stat.psu.edu="">:
>
> > I would not delete data that is below background, even in both
> > channels, if it is above background on at least one array.
> >
> > It seems to me that it is important information to know that a
gene
> >
> > does not express under some condition in your experiment. Of
course,
> >
> > the unfortunate side-effect of our liking to use ratios is that
> > "zero" is not handled well. But a gene that expresses in some
> > conditions of interest but not in others surely is of primary
> > interest to your study.
> >
> > --Naomi
> >
> > At 11:48 AM 7/18/2006, milesg at bu.edu wrote:
> > >HI, my name is Gregory Miles. I'm at Boston University and was
given
> > this
> > >address by Dr. Carey (I went to a seminar of his last week) at
the
> > Harvard
> > >medical school and was told that I could ask my question about 2
> > >color data to
> > >you. On the mouse microarray dataset we have, there are two
colors,
> > and
> > >therefore two values that can be below background. When both
values
> > are above
> > >background (zero_barcode on our chip), we keep the data and when
> > both are
> > >below we eliminate the data (they become NA). I imagine this is a
> > correct
> > >approach, but what should be done regarding the data that has one
> > intensity
> > >below background and one above. Would it be best to keep the good
> > >value? Do we
> > >eliminate the entire gene from entry into bioconductor? Perhaps
> > >there is a way
> > >to specify to bioconductor that this is the case (by entering a
> > background
> > >value) and allowing it to handle the data abstractly? Or is it
best
> > to let
> > >Bioconductor look at them as NA's. Any help would be greatly
> > appreciated.
> > >Thanks!
> > >-Greg Miles
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >Search the archives:
> > >http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > Naomi S. Altman 814-865-3791
(voice)
> > Associate Professor
> > Dept. of Statistics 814-863-7114
(fax)
> > Penn State University 814-865-1348
> > (Statistics)
> > University Park, PA 16802-2111
> >
> >
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111
Just to add a bit here, with many image analysis options, there are
other
measures of the "quality" of a spot besides intensity. Since limma
will
allow you to incorporate this information into an analysis, you might
think
about whether there is some quantity reported by your image analysis
software that might be useful in this regard.
I agree with Naomi that excluding genes based on low-intensity spots
in some
subset of the arrays may be discarding some of the most interesting
data.
Sean
On 7/19/06 8:10, "Naomi Altman" <naomi at="" stat.psu.edu=""> wrote:
> I would not do this. Use less background correction, (e.g. don't
> background correct, or subtract 1/2 of the background), or set the
> channels that are below background to some low value (e.g. 1) so
that
> logs can be used.
>
> --Naomi
>
> At 09:48 AM 7/19/2006, you wrote:
>> Thanks for your quick response. I will not delete the gene
completely (if you
>> delete genes then LIMMA doesn't know how to handle genes lists with
different
>> orders), but although it is helpful to keep genes that may have
>> information in
>> one array, I do think it may be necessary to "NA" the below
background values
>> and keep the above background ones. Thus you still have the good
values but
>> have eliminated possible bad ones. What do you think of this?
>> -greg
>>
>> Quoting Naomi Altman <naomi at="" stat.psu.edu="">:
>>
>>> I would not delete data that is below background, even in both
>>> channels, if it is above background on at least one array.
>>>
>>> It seems to me that it is important information to know that a
gene
>>>
>>> does not express under some condition in your experiment. Of
course,
>>>
>>> the unfortunate side-effect of our liking to use ratios is that
>>> "zero" is not handled well. But a gene that expresses in some
>>> conditions of interest but not in others surely is of primary
>>> interest to your study.
>>>
>>> --Naomi
>>>
>>> At 11:48 AM 7/18/2006, milesg at bu.edu wrote:
>>>> HI, my name is Gregory Miles. I'm at Boston University and was
given
>>> this
>>>> address by Dr. Carey (I went to a seminar of his last week) at
the
>>> Harvard
>>>> medical school and was told that I could ask my question about 2
>>>> color data to
>>>> you. On the mouse microarray dataset we have, there are two
colors,
>>> and
>>>> therefore two values that can be below background. When both
values
>>> are above
>>>> background (zero_barcode on our chip), we keep the data and when
>>> both are
>>>> below we eliminate the data (they become NA). I imagine this is a
>>> correct
>>>> approach, but what should be done regarding the data that has one
>>> intensity
>>>> below background and one above. Would it be best to keep the good
>>>> value? Do we
>>>> eliminate the entire gene from entry into bioconductor? Perhaps
>>>> there is a way
>>>> to specify to bioconductor that this is the case (by entering a
>>> background
>>>> value) and allowing it to handle the data abstractly? Or is it
best
>>> to let
>>>> Bioconductor look at them as NA's. Any help would be greatly
>>> appreciated.
>>>> Thanks!
>>>> -Greg Miles
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> Naomi S. Altman 814-865-3791
(voice)
>>> Associate Professor
>>> Dept. of Statistics 814-863-7114
(fax)
>>> Penn State University 814-865-1348
>>> (Statistics)
>>> University Park, PA 16802-2111
>>>
>>>
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
(Statistics)
> University Park, PA 16802-2111
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Miles,
We are all trying to say: Why do you think a channel with a near- or
below-background is a "bad" value and needs to be removed? If the gene
is
not expressed in one treatment, then it should have a 'zero' value,
which
due to array technology will be some positive number near background
fluorescence. By removing the value completely, you are saying in the
analysis "there is no information for that sample on that array", but
you
do have information on that sample - it was below detectable level.
Even
though the number that comes out of the background correction, either
0.5
or 1 as suggested, is not entirely accurate, it is relatively accurate
to
numbers a good deal higher. Conversely, you should not throw away
saturated
values either, because even though you don't know exactly how large
they
were, you do know they were large. If both channels of a spot are
near/below background on every single array in your experiment, then
you
can remove the entire gene/spot from the analysis.
Cheers,
Jenny
At 09:10 AM 7/19/2006, Naomi Altman wrote:
>I would not do this. Use less background correction, (e.g. don't
>background correct, or subtract 1/2 of the background), or set the
>channels that are below background to some low value (e.g. 1) so that
>logs can be used.
>
>--Naomi
>
>At 09:48 AM 7/19/2006, you wrote:
> >Thanks for your quick response. I will not delete the gene
completely
> (if you
> >delete genes then LIMMA doesn't know how to handle genes lists with
> different
> >orders), but although it is helpful to keep genes that may have
> >information in
> >one array, I do think it may be necessary to "NA" the below
background
> values
> >and keep the above background ones. Thus you still have the good
values but
> >have eliminated possible bad ones. What do you think of this?
> >-greg
> >
> >Quoting Naomi Altman <naomi at="" stat.psu.edu="">:
> >
> > > I would not delete data that is below background, even in both
> > > channels, if it is above background on at least one array.
> > >
> > > It seems to me that it is important information to know that a
gene
> > >
> > > does not express under some condition in your experiment. Of
course,
> > >
> > > the unfortunate side-effect of our liking to use ratios is that
> > > "zero" is not handled well. But a gene that expresses in some
> > > conditions of interest but not in others surely is of primary
> > > interest to your study.
> > >
> > > --Naomi
> > >
> > > At 11:48 AM 7/18/2006, milesg at bu.edu wrote:
> > > >HI, my name is Gregory Miles. I'm at Boston University and was
given
> > > this
> > > >address by Dr. Carey (I went to a seminar of his last week) at
the
> > > Harvard
> > > >medical school and was told that I could ask my question about
2
> > > >color data to
> > > >you. On the mouse microarray dataset we have, there are two
colors,
> > > and
> > > >therefore two values that can be below background. When both
values
> > > are above
> > > >background (zero_barcode on our chip), we keep the data and
when
> > > both are
> > > >below we eliminate the data (they become NA). I imagine this is
a
> > > correct
> > > >approach, but what should be done regarding the data that has
one
> > > intensity
> > > >below background and one above. Would it be best to keep the
good
> > > >value? Do we
> > > >eliminate the entire gene from entry into bioconductor? Perhaps
> > > >there is a way
> > > >to specify to bioconductor that this is the case (by entering a
> > > background
> > > >value) and allowing it to handle the data abstractly? Or is it
best
> > > to let
> > > >Bioconductor look at them as NA's. Any help would be greatly
> > > appreciated.
> > > >Thanks!
> > > >-Greg Miles
> > > >
> > > >_______________________________________________
> > > >Bioconductor mailing list
> > > >Bioconductor at stat.math.ethz.ch
> > > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > >Search the archives:
> > >
>http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > > Naomi S. Altman 814-865-3791
(voice)
> > > Associate Professor
> > > Dept. of Statistics 814-863-7114
(fax)
> > > Penn State University 814-865-1348
> > > (Statistics)
> > > University Park, PA 16802-2111
> > >
> > >
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
Quoting milesg at bu.edu:
> HI, my name is Gregory Miles. I'm at Boston University and was given
this
> address by Dr. Carey (I went to a seminar of his last week) at the
Harvard
> medical school and was told that I could ask my question about 2
> color data to
> you. On the mouse microarray dataset we have, there are two colors,
and
> therefore two values that can be below background. When both values
are above
> background (zero_barcode on our chip), we keep the data and when
both are
> below we eliminate the data (they become NA). I imagine this is a
correct
> approach, but what should be done regarding the data that has one
intensity
> below background and one above. Would it be best to keep the good
> value? Do we
> eliminate the entire gene from entry into bioconductor?
If you eliminate a gene because it has no signal in one channel, you
may be eliminating some of the most interesting genes! It depends on
the biology of your experiments. You should always think about the
experiment underneath, not just about numbers :-)
A gene with no signal in one channel, but ok signal on the other, may
be a gene that becomes silenced after your treatment, or switched on.
In my particular case *those* are the genes that I am after, so I keep
them and cherish them ;-)
However, when there's no signal in both channels, on all your
experiments, it sounds reasonable to eliminate them.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK