Hi Gordon,
The using of controlProbe data has not been well developed yet.
Actually we
would like to hear the opinions from you and other developers about
how to
using control probe information.
The Control_Probe_profile.txt file can also be read by lumiR function.
But
the user has to manually extract the exprs slot and add it into the
controlProbe slot. We will further develop on this part.
Thanks!
Pan
On 10/27/07 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
<bioconductor-request at="" stat.math.ethz.ch=""> wrote:
> Message: 24
> Date: Sat, 27 Oct 2007 15:12:04 +1000
> From: Gordon Smyth <smyth at="" wehi.edu.au="">
> Subject: [BioC] lumi: how is the controlData to be read and used?
> To: "BioC Mailing List" <bioconductor at="" stat.math.ethz.ch="">
> Cc: Wei Shi <shi at="" wehi.edu.au="">, Belinda Phipson <phipson at="" wehi.edu.au="">
> Message-ID: <6.2.5.6.1.20071027143740.02d4ab40 at wehi.edu.au>
> Content-Type: text/plain; charset="us-ascii"; format=flowed
>
> The lumi package functions lumiB() and bgAdjust() mention the fact
> that control data can be used to background correct Illumina data.
> There is however no documentation regarding what the control data
> should contain, how it should be read it in, or exactly how it is
used.
>
> I have summary probe profile data output from BeadStudio (not
> normalized or background corrected):
>
> Sample_Probe_Profile.txt
>
> I also have summary control probe data. This includes both positive
> and negative control probes:
>
> Control_Probe_Profile.txt
>
> How is it recommended that I read and use the control data, prior to
> using lumiT(method="vst")? A complete code example would be helpful.
>
> Gordon
>
At 08:11 AM 28/10/2007, Pan Du wrote:
>Hi Gordon,
>
>The using of controlProbe data has not been well developed yet.
Actually we
>would like to hear the opinions from you and other developers about
how to
>using control probe information.
I am glad that you are open to opinions, but it is disconcerting that
you are not prepared to make any recommendations about background
correction.
This leaves me wondering how to use the lumi package at the moment.
The results from vst do change somewhat depending on how the data has
been background corrected. If you're not making any recommendations
about background correction, does this mean that you're not yet ready
to recommend vst either?
Are you saying that background correction makes so little difference
that vst can be recommended regardless of the background correction
method?
Or are you saying that the pre-processing methods of the lumi package
are in general not yet fully developed or tested, so that we should
not view them as recommendations?
>The Control_Probe_profile.txt file can also be read by lumiR
function. But
>the user has to manually extract the exprs slot and add it into the
>controlProbe slot. We will further develop on this part.
No, this does not work, for several reasons.
Firstly, there is no slot called "controlProbe". I think you mean
"controlData".
Secondly, there is no slot called "exprs". I think you mean the
exprs() extractor function.
Thirdly, exprs(x) is a matrix, whereas the controlData slot has to be
a data.frame.
Fourthly, and most serious, when lumiR() reads a control probe
profile, exprs(x) loses any information on which probes are negative
controls, making it completely useless for background correction. The
row names are just probeID numbers:
> x <- lumiR(file.path("data","Actn3_Control_Probe_Profile.txt"))
Warning message:
In lumiR(file.path("data", "Actn3_Control_Probe_Profile.txt")) :
Duplicated IDs found and were merged!
> rownames(exprs(x))[1:5]
[1] "610064" "100610064" "100580056" "580056" "360035"
As I said, a complete code example would be helpful!
Gordon
>Thanks!
>Pan
>
>
> > Message: 24
> > Date: Sat, 27 Oct 2007 15:12:04 +1000
> > From: Gordon Smyth <smyth at="" wehi.edu.au="">
> > Subject: [BioC] lumi: how is the controlData to be read and used?
> > To: "BioC Mailing List" <bioconductor at="" stat.math.ethz.ch="">
> > Cc: Wei Shi <shi at="" wehi.edu.au="">, Belinda Phipson <phipson at="" wehi.edu.au="">
> >
> > The lumi package functions lumiB() and bgAdjust() mention the fact
> > that control data can be used to background correct Illumina data.
> > There is however no documentation regarding what the control data
> > should contain, how it should be read it in, or exactly how it is
used.
> >
> > I have summary probe profile data output from BeadStudio (not
> > normalized or background corrected):
> >
> > Sample_Probe_Profile.txt
> >
> > I also have summary control probe data. This includes both
positive
> > and negative control probes:
> >
> > Control_Probe_Profile.txt
> >
> > How is it recommended that I read and use the control data, prior
to
> > using lumiT(method="vst")? A complete code example would be
helpful.
> >
> > Gordon
>>
>> The using of controlProbe data has not been well developed yet.
Actually we
>> would like to hear the opinions from you and other developers about
how to
>> using control probe information.
>
> I am glad that you are open to opinions, but it is disconcerting
that
> you are not prepared to make any recommendations about background
correction.
>
> This leaves me wondering how to use the lumi package at the moment.
> The results from vst do change somewhat depending on how the data
has
> been background corrected. If you're not making any recommendations
> about background correction, does this mean that you're not yet
ready
> to recommend vst either?
>
> Are you saying that background correction makes so little difference
> that vst can be recommended regardless of the background correction
method?
>
> Or are you saying that the pre-processing methods of the lumi
package
> are in general not yet fully developed or tested, so that we should
> not view them as recommendations?
What I mean here for the using of control Probe data is using control
Probe
information for the quality control information. For the background
adjustment part, currently, we believe using the BeadStudio
recommended
method works well. Of course further improvement is possible. The
contribution in this part is very welcome.
>> The Control_Probe_profile.txt file can also be read by lumiR
function. But
>> the user has to manually extract the exprs slot and add it into the
>> controlProbe slot. We will further develop on this part.
>
> No, this does not work, for several reasons.
>
> Firstly, there is no slot called "controlProbe". I think you mean
> "controlData".
>
> Secondly, there is no slot called "exprs". I think you mean the
> exprs() extractor function.
>
> Thirdly, exprs(x) is a matrix, whereas the controlData slot has to
be
> a data.frame.
>
> Fourthly, and most serious, when lumiR() reads a control probe
> profile, exprs(x) loses any information on which probes are negative
> controls, making it completely useless for background correction.
The
> row names are just probeID numbers:
We will work on this part later. Because we are busy with other
projects, it
may take several weeks. Thanks!
Pan
At 10:17 PM 28/10/2007, Pan Du wrote:
>What I mean here for the using of control Probe data is using control
Probe
>information for the quality control information. For the background
>adjustment part, currently, we believe using the BeadStudio
recommended
>method works well. Of course further improvement is possible. The
>contribution in this part is very welcome.
OK, good, now we're getting somewhere. You're recommending
BeadStudio's global background correction. Let me now rephrase my
original question. Suppose that I have BeadStudio output data which
is not background corrected. How can I use R to reproduce the
background correction that BeadStudio would have done?
This is a very important question, because most Bioconductor users of
the lumi package will I guess have Illumina output data which is not
normalized and not background corrected. And we will not necessarily
want to go back to BeadStudio to background correct.
I have summary probe profile data output from BeadStudio which is not
background corrected. Let me repeat, it is not background corrected.
Sample_Probe_Profile.txt
I also have control probe summary profiles and control gene summary
profiles. This includes both positive and negative control probes:
Control_Probe_Profile.txt
Control_Gene_Profile.txt
I should surely be able to reproduce BeadStudio's background
correction. Here is my best effort using the lumi package. Is this
what you recommend?
library(lumi)
x <- lumiR("Sample_Probe_Profile.txt")
controlgp <- lumiR("Control_Gene_Profile.txt")
x at controlData <- as.data.frame(exprs(controlgp))
xb <- lumiB(x,method="bgAdjust")
y <- lumiT(xb,method="vst")
y <- lumiN(y,method="quantile")
As you can see from the results below, lumiB() simply subtracted the
negative control expression value from the expression values for each
array.
Best wishes
Gordon
> exprs(controlgp)[,1:4]
1957998084_A 1957998084_B 1957998084_C
1957998084_D
biotin 11508.6 10857.9 10641.8
10536.3
cy3_hyb 20252.0 19227.1 18964.8
19457.2
high_stringency_hyb 47593.1 43267.2 43966.6
43207.8
housekeeping 16185.3 14039.6 13277.5
13280.2
labeling 85.2 89.5 77.4
80.7
low_stringency_hyb 17650.5 16441.4 16330.1
16844.8
negative 92.0 90.0 83.2
88.1
> summary(exprs(x)[,1:4])
1957998084_A 1957998084_B 1957998084_C 1957998084_D
Min. : 52.9 Min. : 50.2 Min. : 48.6 Min. :
54.1
1st Qu.: 86.6 1st Qu.: 84.3 1st Qu.: 78.2 1st Qu.:
82.3
Median : 99.0 Median : 96.6 Median : 88.7 Median :
93.9
Mean : 511.4 Mean : 501.0 Mean : 400.3 Mean :
448.0
3rd Qu.: 163.9 3rd Qu.: 159.3 3rd Qu.: 138.3 3rd Qu.:
148.9
Max. :59875.4 Max. :57223.1 Max. :50414.0 Max.
:49213.6
> summary(exprs(xb)[,1:4])
1957998084_A 1957998084_B 1957998084_C
1957998084_D
Min. : -39.09 Min. : -39.83 Min. : -34.64 Min. :
-34.08
1st Qu.: -5.40 1st Qu.: -5.73 1st Qu.: -5.01 1st Qu.:
-5.80
Median : 7.05 Median : 6.65 Median : 5.48 Median :
5.76
Mean : 419.47 Mean : 411.01 Mean : 317.04 Mean :
359.90
3rd Qu.: 71.95 3rd Qu.: 69.27 3rd Qu.: 55.08 3rd Qu.:
60.77
Max. :59783.48 Max. :57133.12 Max. :50330.79 Max.
:49125.42
Hi Gordon,
Sorry for replying late. I think that should work because the
Control_Gene_Profile.txt file basically averaged the negative control
probes. As described in the BeadStudio manual, its background
adjustment
basically subtact the mean of negative control probes. But I am not
sure
whether BeadStudio did outlier removal or not. Anyway, the results
should be
close.
Also I will update lumiR function (or write a new function) to read
the
Control_Probe_Profile.txt because the negative control probes have the
same
probe Ids. Thanks!
Pan
On 10/28/07 9:03 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote:
> At 10:17 PM 28/10/2007, Pan Du wrote:
>> What I mean here for the using of control Probe data is using
control Probe
>> information for the quality control information. For the background
>> adjustment part, currently, we believe using the BeadStudio
recommended
>> method works well. Of course further improvement is possible. The
>> contribution in this part is very welcome.
>
> OK, good, now we're getting somewhere. You're recommending
> BeadStudio's global background correction. Let me now rephrase my
> original question. Suppose that I have BeadStudio output data which
> is not background corrected. How can I use R to reproduce the
> background correction that BeadStudio would have done?
>
> This is a very important question, because most Bioconductor users
of
> the lumi package will I guess have Illumina output data which is not
> normalized and not background corrected. And we will not necessarily
> want to go back to BeadStudio to background correct.
>
> I have summary probe profile data output from BeadStudio which is
not
> background corrected. Let me repeat, it is not background corrected.
>
> Sample_Probe_Profile.txt
>
> I also have control probe summary profiles and control gene summary
> profiles. This includes both positive and negative control probes:
>
> Control_Probe_Profile.txt
> Control_Gene_Profile.txt
>
> I should surely be able to reproduce BeadStudio's background
> correction. Here is my best effort using the lumi package. Is this
> what you recommend?
>
> library(lumi)
> x <- lumiR("Sample_Probe_Profile.txt")
> controlgp <- lumiR("Control_Gene_Profile.txt")
> x at controlData <- as.data.frame(exprs(controlgp))
> xb <- lumiB(x,method="bgAdjust")
> y <- lumiT(xb,method="vst")
> y <- lumiN(y,method="quantile")
>
> As you can see from the results below, lumiB() simply subtracted the
> negative control expression value from the expression values for
each array.
>
> Best wishes
> Gordon
>
>
>> exprs(controlgp)[,1:4]
> 1957998084_A 1957998084_B 1957998084_C
1957998084_D
> biotin 11508.6 10857.9 10641.8
10536.3
> cy3_hyb 20252.0 19227.1 18964.8
19457.2
> high_stringency_hyb 47593.1 43267.2 43966.6
43207.8
> housekeeping 16185.3 14039.6 13277.5
13280.2
> labeling 85.2 89.5 77.4
80.7
> low_stringency_hyb 17650.5 16441.4 16330.1
16844.8
> negative 92.0 90.0 83.2
88.1
>> summary(exprs(x)[,1:4])
> 1957998084_A 1957998084_B 1957998084_C
1957998084_D
> Min. : 52.9 Min. : 50.2 Min. : 48.6 Min. :
54.1
> 1st Qu.: 86.6 1st Qu.: 84.3 1st Qu.: 78.2 1st Qu.:
82.3
> Median : 99.0 Median : 96.6 Median : 88.7 Median :
93.9
> Mean : 511.4 Mean : 501.0 Mean : 400.3 Mean :
448.0
> 3rd Qu.: 163.9 3rd Qu.: 159.3 3rd Qu.: 138.3 3rd Qu.:
148.9
> Max. :59875.4 Max. :57223.1 Max. :50414.0 Max.
:49213.6
>> summary(exprs(xb)[,1:4])
> 1957998084_A 1957998084_B 1957998084_C
1957998084_D
> Min. : -39.09 Min. : -39.83 Min. : -34.64 Min. :
-34.08
> 1st Qu.: -5.40 1st Qu.: -5.73 1st Qu.: -5.01 1st Qu.:
-5.80
> Median : 7.05 Median : 6.65 Median : 5.48 Median :
5.76
> Mean : 419.47 Mean : 411.01 Mean : 317.04 Mean :
359.90
> 3rd Qu.: 71.95 3rd Qu.: 69.27 3rd Qu.: 55.08 3rd Qu.:
60.77
> Max. :59783.48 Max. :57133.12 Max. :50330.79 Max.
:49125.42
>
>
>
At 10:00 AM 30/10/2007, Pan Du wrote:
>Hi Gordon,
>
>Sorry for replying late. I think that should work because the
>Control_Gene_Profile.txt file basically averaged the negative control
>probes. As described in the BeadStudio manual, its background
adjustment
>basically subtact the mean of negative control probes. But I am not
sure
>whether BeadStudio did outlier removal or not. Anyway, the results
should be
>close.
Thanks.
>Also I will update lumiR function (or write a new function) to read
the
>Control_Probe_Profile.txt because the negative control probes have
the same
>probe Ids.
Actually, the ProbeIDs are all different for the negative controls.
It is the TargetIDs which are the same.
Repetition of ProbeIDs only occurs when the same probe can be
classified as more than one type of control (for example mouse probe
60019 is both a cy3_hyb control and a low_stringency_hyb control).
Best wishes
Gordon
>Thanks!
>Pan