Hello Users,
I have a question regarding the usage of backgroundCorrect function in
LIMMA.
when I do the following with offset 50, I am getting 2900
differentially
expressed genes
RG.b <- backgroundCorrect(RG, method = "normexp", offset = 50);
where as, when I do the following with offset 1,
I am getting 1300 differentially expressed genes
RG.b <- backgroundCorrect(RG, method = "normexp", offset = 1);
Please advise which offset value to be used? Why is offset value
making
so much difference?
I am using this for TWO channel data, which is read by "genepix".
Greatly appreciate your help.
Prasad
Hi Prasad,
On 8/26/2011 11:00 AM, Prasad Siddavatam wrote:
>
>
> Hello Users,
>
> I have a question regarding the usage of backgroundCorrect function
in LIMMA.
>
> when I do the following with offset 50, I am getting 2900
differentially
> expressed genes
> RG.b<- backgroundCorrect(RG, method = "normexp", offset = 50);
>
> where as, when I do the following with offset 1,
> I am getting 1300 differentially expressed genes
> RG.b<- backgroundCorrect(RG, method = "normexp", offset = 1);
>
> Please advise which offset value to be used? Why is offset value
making
> so much difference?
I can't advise you on the offset to use; that is up to you as the data
analyst. But I can explain why you get more genes with a larger
offset.
When you do a local background correction of your data, for the set of
spots that are fairly dim (not much different from background
intensity), the resulting ratios can become unstable because the
numerators and/or denominators get small. This gives the
characteristic
spreading of the MA plot at low intensities after background
correction.
An extreme example would be the instance where the R and G channels
are
nearly identical (say, 200 and 205), so the uncorrected ratio is close
to 1. But if the Rb and Gb values are, say 190 and 185, then the
background corrected ratio will be 2! Adding 50 to the R and G values
before background correction will dampen the ratio to 0.86, which is
likely closer to truth.
If you do MA plots before background correction and then after, both
with and without adding the offset you will see what I mean.
Best,
Jim
>
> I am using this for TWO channel data, which is read by "genepix".
>
> Greatly appreciate your help.
>
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
Hi Jim,
Thank you very much for you explanation. Now I can understand the
reason.
But when you said its up to the analyst to decide on the offset value.
Is this statement based on the number of genes I am
expecting (to be differentially expressed) or on some other criteria.
Its is very critical because I am using several types of
arrays (agilent, cDNAs)
Appreciate your help.
Prasad
Hi Prasad,
On 8/26/2011 4:30 PM, Prasad Siddavatam wrote:
> Hi Jim,
>
> Thank you very much for you explanation. Now I can understand the
reason.
>
> But when you said its up to the analyst to decide on the offset
value.
> Is this statement based on the number of genes I am
> expecting (to be differentially expressed) or on some other
criteria.
Yes ;-D.
Seriously though, this is where knowledge of the experiment,
exploratory
data analysis, etc come into play. You will likely have to make some
assumptions, based on what your collaborators say about their
expectations, what the data look like, etc.
It's not easy, and you never know if you made the correct assumptions.
All you can do is realize what assumptions you have made, and have a
reasonable rationale for why you made them.
Best,
Jim
>
> Its is very critical because I am using several types of
> arrays (agilent, cDNAs)
>
>
> Appreciate your help.
>
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
Dear Prasad,
The offset added to the data is used to achieve a good balance between
precision and bias for the data analysis. The larger offset will give
rise to higher precision (smaller variation between replicates), but
it will yield larger bias as well (e.g. dampened fold changes). The
paper below gives a systematic evaluation on the impact of using
different offsets on the precision, bias and false discovery rate for
the Illumina BeadChip data. But it should be useful for other
platforms as well.
http://www.ncbi.nlm.nih.gov/pubmed/20929874
Cheers,
Wei
On Aug 27, 2011, at 6:30 AM, Prasad Siddavatam wrote:
> Hi Jim,
>
> Thank you very much for you explanation. Now I can understand the
reason.
>
> But when you said its up to the analyst to decide on the offset
value.
> Is this statement based on the number of genes I am
> expecting (to be differentially expressed) or on some other
criteria.
>
> Its is very critical because I am using several types of
> arrays (agilent, cDNAs)
>
>
> Appreciate your help.
>
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Hi Jim,
I really really appreciate your help.
But I have a problem here. The datasets are downloaded from NCBI.
I can't get many details about the experiments, hence the trouble.
Even the original publications doesn't say much about the experiments.
Prasad
Dear Dr. Smyth,
Thank you very much for your detailed response. I am going to read the
references. I understand why I am getting more differential genes with
bigger
offset value. Basically we are bringing the variance close to zero
(with more
uniformity across probes).
Are you suggesting to delete the probes before doing the background
correct?
If yes, is there any limit on the maximum number of probes to be
deleted?
In the variance stabilization, which maximum value (approximately) of
fit$df.prior is considered high enough to call a good variance
stabilization is
achieved.
Appreciate your time and help.
Prasad
(Apologies if you have received this already or if this is considered
spam. Please feel free to pass on to anyone who might be interested.)
The Stazione Zoologica Anton Dohrn in Naples is among the top research
institutions in the world in the fields of marine biology and ecology.
The new established bioinformatics laboratory is seeking for a
candidate
interested in the evolution of genome architecture
http://bit.ly/okEGvL
We are looking for someone who understands basic biological and
evolutionary problems and is able to independently accomplish
bioinformatics tasks.
Candidates will be expected to have knowledge of biology, genetics and
functional genomics, to demonstrate the ability to work in a
UNIX/Linux
environment and to be familiar with a scripting language (e.g. Perl),
a
database system (e.g. MySQL) and a statistical programming environment
(e.g R). Previous experience with comparative genomics and genomics
databases as well as an understanding of statistical methods used in
the
interpretation of biological data is a desirable asset. Wet lab work
might be required during the PhD.
All the information about the PhD and the guidelines on how to apply
are
listed on the webpage http://bit.ly/d2WuXk
The closing date for applications is 20 September 2011.
Kind Regards
Remo
--
Remo Sanges
Bioinformatics - Animal Physiology and Evolution
Stazione Zoologica Anton Dohrn
Villa Comunale, 80121 Napoli - Italy
+39 081 5833428
Dear Prasad,
> Date: Mon, 29 Aug 2011 02:17:01 +0000
> From: Prasad Siddavatam <siddavatam at="" gmail.com="">
> To: <bioconductor at="" stat.math.ethz.ch="">
> Subject: Re: [BioC] backgroundCorrect offset value
>
> Dear Dr. Smyth,
>
> Thank you very much for your detailed response. I am going to read
the
> references. I understand why I am getting more differential genes
with bigger
> offset value. Basically we are bringing the variance close to zero
(with more
> uniformity across probes).
>
> Are you suggesting to delete the probes before doing the background
correct?
No, I'm not. All probes should be retained for background correction.
Non-expressed probes should be filtered before using eBayes().
> If yes, is there any limit on the maximum number of probes to be
deleted?
>
> In the variance stabilization, which maximum value (approximately)
of
> fit$df.prior is considered high enough to call a good variance
stabilization is
> achieved.
There is no maximum value. Higher is better.
I think you're worrying about this more than is necessary. A decent
value
like offset=50 will give good results in a wide variety of situations.
You can even use
fit <- eBayes(fit, trend=TRUE)
which will makes uniformity of the variance less important. Again,
use
plotSA(fit)
to see what this does.
Best wishes
Gordon
> Appreciate your time and help.
>
> Prasad
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}