Entering edit mode
Dear Bioconductor Users,
I am currently analysing some Illumina expression data (HumanWG-
6_V3,GenomeStudio v1.6), and have noticed an issue when comparing data
processed using both Lumi and Limma.
I initially processed the data using Limma as described in the Limma
User's Guide (p89).
expressed = apply(x$other$Detection < 0.05, 1, any)
y = y[expressed,] ## y now contains only the probes expressed in ALL
samples
= ~8500 probes
However with lumi:
raw = lumiR("file.txt")
norm = lumiExpresso(norm,QC.evaluation=TRUE)
x = exprs(norm)
presentCount = detectionCall(x)
y = x[presentCount > 0,]
=~20,000
When I compared the actual detection values in the Limma and Lumi
objects, I found that they are actually different. The detection
p-values in the limma object are the same as the text file directly
from the GenomeStudio output, those values in the lumi object are the
result of subtracting the raw values from 1.
Examples of the 3 different detection tables can be seen below.
This appears to be the relevant lumiR code:
if (length(grep("Detection Pval", header, ignore.case = TRUE)) == 0) {
detection <- 1 - detection
}
In the GenomeStudio file that I am working with the detection p-value
column is labelled "Detection-4457260019_A" and so the function does
not find the "Detection Pval" string and thus performs the conversion
of the detection values. I confirmed that this is the case by renaming
my detection headings to "Detection Pval-4457260019_A", which resulted
in the detection values not being converted and thus remaining equal
to the "raw" and Limma detection values.
According to the GenomeStudio v1.0 manual regarding detection
p-values:
If the Z score for the probe intensity is smaller than the lowest
negative control Z score, the function returns a 0 and the
p-value is 1.
If the Z score for the probe intensity falls within the range of the Z
scores of the negative controls, R is the rank of the Z score of the
probe, and the p-value is in the range of 0 to 1.
If the Z score for the probe intensity is greater than the largest
negative control Z score, the function returns a 1 and the
p-value is 0.
This suggests that the detection p-value for an expressed probe should
be close to 0 in data generated by current releases of GenomeStudio. I
know that with some older versions of BeadStudio that the detection
value for expressed probes was actually close to 1, and Lumi was built
to take this to account; however, I do not see any reason why the
detection values for our data should be converted, as they were
generated by a relatively new version of GenomeStudio. I propose that
Illumina has perhaps changed their column naming system and that this
has not been reflected in Lumi as yet. This error can have a
significant impact on people's results and I felt it was necessary to
bring it to the group's attention.
## Detection p-values as seen in the Limma object
$other
$Detection
4457260019_A 4457260019_B 4457260019_C 4457260019_D
4457260019_E
6450255 0.27536 0.51647 0.06983 0.89065
0.46245
2570615 0.97233 0.97892 0.98682 0.98814
0.98814
6370619 0.89196 0.72727 0.86825 0.96706
0.88669
2600039 0.71014 0.02899 0.39921 0.14361
0.53491
2650615 0.85375 0.60079 0.88274 0.94071
0.40711
4457260019_F 4463361183_A 4463361183_B 4463361183_C
4463361183_D
6450255 0.42161 0.55072 0.65613 0.22398
0.29117
2570615 0.97628 0.77339 0.98155 0.98287
0.98946
6370619 0.84848 0.85507 0.94993 0.98287
0.91963
2600039 0.21476 0.54414 0.46377 0.45982
0.32016
2650615 0.38603 0.92754 0.57312 0.57181
0.64559
4463361183_E 4463361183_F 5511070019_A 5511070019_B
5511070021_A
6450255 0.42951 0.35705 0.25823 0.23979
0.31094
2570615 0.97892 0.99209 0.97760 0.99341
0.95652
6370619 0.77339 0.89855 0.78920 0.75362
0.43478
2600039 0.35968 0.23979 0.17391 0.40975
0.72596
2650615 0.49407 0.16996 0.57312 0.52306
0.46113
5511070021_B 5511070021_C 5511070021_D 5511070021_E
5511070021_F
6450255 0.82213 0.48353 0.37681 0.26482
0.27536
2570615 0.98024 0.97497 0.95784 0.93412
0.96970
6370619 0.48748 0.59947 0.48880 0.50988
0.82213
2600039 0.20422 0.24769 0.49144 0.57049
0.43742
2650615 0.52306 0.37418 0.33202 0.39789
0.60079
49582 more rows ...
## Detection p-values as seen in the Lumi object
> detect[1:5,]
4457260019_A 4457260019_B 4457260019_C 4457260019_D
4457260019_E
6450255 0.72464 0.48353 0.93017 0.10935
0.53755
2570615 0.02767 0.02108 0.01318 0.01186
0.01186
6370619 0.10804 0.27273 0.13175 0.03294
0.11331
2600039 0.28986 0.97101 0.60079 0.85639
0.46509
2650615 0.14625 0.39921 0.11726 0.05929
0.59289
4457260019_F 4463361183_A 4463361183_B 4463361183_C
4463361183_E
6450255 0.57839 0.44928 0.34387 0.77602
0.57049
2570615 0.02372 0.22661 0.01845 0.01713
0.02108
6370619 0.15152 0.14493 0.05007 0.01713
0.22661
2600039 0.78524 0.45586 0.53623 0.54018
0.64032
2650615 0.61397 0.07246 0.42688 0.42819
0.50593
4463361183_F 5511070019_A 5511070021_A 5511070021_B
5511070021_C
6450255 0.64295 0.74177 0.68906 0.17787
0.51647
2570615 0.00791 0.02240 0.04348 0.01976
0.02503
6370619 0.10145 0.21080 0.56522 0.51252
0.40053
2600039 0.76021 0.82609 0.27404 0.79578
0.75231
2650615 0.83004 0.42688 0.53887 0.47694
0.62582
5511070021_D 5511070021_E 5511070021_F
6450255 0.62319 0.73518 0.72464
2570615 0.04216 0.06588 0.03030
6370619 0.51120 0.49012 0.17787
2600039 0.50856 0.42951 0.56258
2650615 0.66798 0.60211 0.39921
## Detection p-values read as a text file from GenomeStudio output
> raw.detect[1:5,]
Detection.4457260019_A Detection.4457260019_B
Detection.4457260019_C
6450255 0.27536 0.51647
0.06983
2570615 0.97233 0.97892
0.98682
6370619 0.89196 0.72727
0.86825
2600039 0.71014 0.02899
0.39921
2650615 0.85375 0.60079
0.88274
Detection.4457260019_D Detection.4457260019_E
Detection.4457260019_F
6450255 0.89065 0.46245
0.42161
2570615 0.98814 0.98814
0.97628
6370619 0.96706 0.88669
0.84848
2600039 0.14361 0.53491
0.21476
2650615 0.94071 0.40711
0.38603
Detection.4463361183_A Detection.4463361183_B
Detection.4463361183_C
6450255 0.55072 0.65613
0.22398
2570615 0.77339 0.98155
0.98287
6370619 0.85507 0.94993
0.98287
2600039 0.54414 0.46377
0.45982
2650615 0.92754 0.57312
0.57181
Detection.4463361183_D Detection.4463361183_E
Detection.4463361183_F
6450255 0.29117 0.42951
0.35705
2570615 0.98946 0.97892
0.99209
6370619 0.91963 0.77339
0.89855
2600039 0.32016 0.35968
0.23979
2650615 0.64559 0.49407
0.16996
Detection.5511070019_A Detection.5511070019_B
Detection.5511070021_A
6450255 0.25823 0.23979
0.31094
2570615 0.97760 0.99341
0.95652
6370619 0.78920 0.75362
0.43478
2600039 0.17391 0.40975
0.72596
2650615 0.57312 0.52306
0.46113
Detection.5511070021_B Detection.5511070021_C
Detection.5511070021_D
6450255 0.82213 0.48353
0.37681
2570615 0.98024 0.97497
0.95784
6370619 0.48748 0.59947
0.48880
2600039 0.20422 0.24769
0.49144
2650615 0.52306 0.37418
0.33202
Detection.5511070021_E Detection.5511070021_F
6450255 0.26482 0.27536
2570615 0.93412 0.96970
6370619 0.50988 0.82213
2600039 0.57049 0.43742
2650615 0.39789 0.60079
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lumi_2.2.1 Biobase_2.10.0 limma_3.6.9
loaded via a namespace (and not attached):
[1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0
[4] AnnotationDbi_1.12.0 DBI_0.2-5 grid_2.12.0
[7] hdrcde_2.15 KernSmooth_2.23-4 lattice_0.19-13
[10] MASS_7.3-8 Matrix_0.999375-44 methylumi_1.6.1
[13] mgcv_1.7-0 nlme_3.1-97 preprocessCore_1.12.0
[16] RSQLite_0.9-2 xtable_1.5-6
Jovana Maksimovic B.Sc (Hons) / B.Binf
Bioinformatics Officer
Bioinformatics, Enabling Facilities
Murdoch Childrens Research Institute
The Royal Children?s Hospital
Flemington Road Parkville Victoria 3052 Australia
E jovana.maksimovic at mcri.edu.au
www.mcri.edu.au
This e-mail and any attachments to it (the "Communication") are,
unless otherwise stated, confidential, may contain copyright material
and is for the use only of the intended recipient. If you receive the
Communication in error, please notify the sender immediately by return
e-mail, delete the Communication and the return e-mail, and do not
read, copy, retransmit or otherwise deal with it. Any views expressed
in the Communication are those of the individual sender only, unless
expressly stated to be those of Murdoch Childrens Research Institute
(MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does
not accept liability in connection with the integrity of or errors in
the Communication, computer virus, data corruption, interference or
delay arising from or in respect of the Communication. P????? Please
consider the environment before printing this email