Possible issue with detection p-values in Lumi package
1
0
Entering edit mode
@jovana-maksimovic-4422
Last seen 12 weeks ago
Australia
Dear Bioconductor Users, I am currently analysing some Illumina expression data (HumanWG- 6_V3,GenomeStudio v1.6), and have noticed an issue when comparing data processed using both Lumi and Limma. I initially processed the data using Limma as described in the Limma User's Guide (p89). expressed = apply(x$other$Detection < 0.05, 1, any) y = y[expressed,] ## y now contains only the probes expressed in ALL samples = ~8500 probes However with lumi: raw = lumiR("file.txt") norm = lumiExpresso(norm,QC.evaluation=TRUE) x = exprs(norm) presentCount = detectionCall(x) y = x[presentCount > 0,] =~20,000 When I compared the actual detection values in the Limma and Lumi objects, I found that they are actually different. The detection p-values in the limma object are the same as the text file directly from the GenomeStudio output, those values in the lumi object are the result of subtracting the raw values from 1. Examples of the 3 different detection tables can be seen below. This appears to be the relevant lumiR code: if (length(grep("Detection Pval", header, ignore.case = TRUE)) == 0) { detection <- 1 - detection } In the GenomeStudio file that I am working with the detection p-value column is labelled "Detection-4457260019_A" and so the function does not find the "Detection Pval" string and thus performs the conversion of the detection values. I confirmed that this is the case by renaming my detection headings to "Detection Pval-4457260019_A", which resulted in the detection values not being converted and thus remaining equal to the "raw" and Limma detection values. According to the GenomeStudio v1.0 manual regarding detection p-values: If the Z score for the probe intensity is smaller than the lowest negative control Z score, the function returns a 0 and the p-value is 1. If the Z score for the probe intensity falls within the range of the Z scores of the negative controls, R is the rank of the Z score of the probe, and the p-value is in the range of 0 to 1. If the Z score for the probe intensity is greater than the largest negative control Z score, the function returns a 1 and the p-value is 0. This suggests that the detection p-value for an expressed probe should be close to 0 in data generated by current releases of GenomeStudio. I know that with some older versions of BeadStudio that the detection value for expressed probes was actually close to 1, and Lumi was built to take this to account; however, I do not see any reason why the detection values for our data should be converted, as they were generated by a relatively new version of GenomeStudio. I propose that Illumina has perhaps changed their column naming system and that this has not been reflected in Lumi as yet. This error can have a significant impact on people's results and I felt it was necessary to bring it to the group's attention. ## Detection p-values as seen in the Limma object $other $Detection 4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E 6450255 0.27536 0.51647 0.06983 0.89065 0.46245 2570615 0.97233 0.97892 0.98682 0.98814 0.98814 6370619 0.89196 0.72727 0.86825 0.96706 0.88669 2600039 0.71014 0.02899 0.39921 0.14361 0.53491 2650615 0.85375 0.60079 0.88274 0.94071 0.40711 4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_D 6450255 0.42161 0.55072 0.65613 0.22398 0.29117 2570615 0.97628 0.77339 0.98155 0.98287 0.98946 6370619 0.84848 0.85507 0.94993 0.98287 0.91963 2600039 0.21476 0.54414 0.46377 0.45982 0.32016 2650615 0.38603 0.92754 0.57312 0.57181 0.64559 4463361183_E 4463361183_F 5511070019_A 5511070019_B 5511070021_A 6450255 0.42951 0.35705 0.25823 0.23979 0.31094 2570615 0.97892 0.99209 0.97760 0.99341 0.95652 6370619 0.77339 0.89855 0.78920 0.75362 0.43478 2600039 0.35968 0.23979 0.17391 0.40975 0.72596 2650615 0.49407 0.16996 0.57312 0.52306 0.46113 5511070021_B 5511070021_C 5511070021_D 5511070021_E 5511070021_F 6450255 0.82213 0.48353 0.37681 0.26482 0.27536 2570615 0.98024 0.97497 0.95784 0.93412 0.96970 6370619 0.48748 0.59947 0.48880 0.50988 0.82213 2600039 0.20422 0.24769 0.49144 0.57049 0.43742 2650615 0.52306 0.37418 0.33202 0.39789 0.60079 49582 more rows ... ## Detection p-values as seen in the Lumi object > detect[1:5,] 4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E 6450255 0.72464 0.48353 0.93017 0.10935 0.53755 2570615 0.02767 0.02108 0.01318 0.01186 0.01186 6370619 0.10804 0.27273 0.13175 0.03294 0.11331 2600039 0.28986 0.97101 0.60079 0.85639 0.46509 2650615 0.14625 0.39921 0.11726 0.05929 0.59289 4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_E 6450255 0.57839 0.44928 0.34387 0.77602 0.57049 2570615 0.02372 0.22661 0.01845 0.01713 0.02108 6370619 0.15152 0.14493 0.05007 0.01713 0.22661 2600039 0.78524 0.45586 0.53623 0.54018 0.64032 2650615 0.61397 0.07246 0.42688 0.42819 0.50593 4463361183_F 5511070019_A 5511070021_A 5511070021_B 5511070021_C 6450255 0.64295 0.74177 0.68906 0.17787 0.51647 2570615 0.00791 0.02240 0.04348 0.01976 0.02503 6370619 0.10145 0.21080 0.56522 0.51252 0.40053 2600039 0.76021 0.82609 0.27404 0.79578 0.75231 2650615 0.83004 0.42688 0.53887 0.47694 0.62582 5511070021_D 5511070021_E 5511070021_F 6450255 0.62319 0.73518 0.72464 2570615 0.04216 0.06588 0.03030 6370619 0.51120 0.49012 0.17787 2600039 0.50856 0.42951 0.56258 2650615 0.66798 0.60211 0.39921 ## Detection p-values read as a text file from GenomeStudio output > raw.detect[1:5,] Detection.4457260019_A Detection.4457260019_B Detection.4457260019_C 6450255 0.27536 0.51647 0.06983 2570615 0.97233 0.97892 0.98682 6370619 0.89196 0.72727 0.86825 2600039 0.71014 0.02899 0.39921 2650615 0.85375 0.60079 0.88274 Detection.4457260019_D Detection.4457260019_E Detection.4457260019_F 6450255 0.89065 0.46245 0.42161 2570615 0.98814 0.98814 0.97628 6370619 0.96706 0.88669 0.84848 2600039 0.14361 0.53491 0.21476 2650615 0.94071 0.40711 0.38603 Detection.4463361183_A Detection.4463361183_B Detection.4463361183_C 6450255 0.55072 0.65613 0.22398 2570615 0.77339 0.98155 0.98287 6370619 0.85507 0.94993 0.98287 2600039 0.54414 0.46377 0.45982 2650615 0.92754 0.57312 0.57181 Detection.4463361183_D Detection.4463361183_E Detection.4463361183_F 6450255 0.29117 0.42951 0.35705 2570615 0.98946 0.97892 0.99209 6370619 0.91963 0.77339 0.89855 2600039 0.32016 0.35968 0.23979 2650615 0.64559 0.49407 0.16996 Detection.5511070019_A Detection.5511070019_B Detection.5511070021_A 6450255 0.25823 0.23979 0.31094 2570615 0.97760 0.99341 0.95652 6370619 0.78920 0.75362 0.43478 2600039 0.17391 0.40975 0.72596 2650615 0.57312 0.52306 0.46113 Detection.5511070021_B Detection.5511070021_C Detection.5511070021_D 6450255 0.82213 0.48353 0.37681 2570615 0.98024 0.97497 0.95784 6370619 0.48748 0.59947 0.48880 2600039 0.20422 0.24769 0.49144 2650615 0.52306 0.37418 0.33202 Detection.5511070021_E Detection.5511070021_F 6450255 0.26482 0.27536 2570615 0.93412 0.96970 6370619 0.50988 0.82213 2600039 0.57049 0.43742 2650615 0.39789 0.60079 > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lumi_2.2.1 Biobase_2.10.0 limma_3.6.9 loaded via a namespace (and not attached): [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0 [4] AnnotationDbi_1.12.0 DBI_0.2-5 grid_2.12.0 [7] hdrcde_2.15 KernSmooth_2.23-4 lattice_0.19-13 [10] MASS_7.3-8 Matrix_0.999375-44 methylumi_1.6.1 [13] mgcv_1.7-0 nlme_3.1-97 preprocessCore_1.12.0 [16] RSQLite_0.9-2 xtable_1.5-6 Jovana Maksimovic B.Sc (Hons) / B.Binf Bioinformatics Officer Bioinformatics, Enabling Facilities Murdoch Childrens Research Institute The Royal Children?s Hospital Flemington Road Parkville Victoria 3052 Australia E jovana.maksimovic at mcri.edu.au www.mcri.edu.au This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication. P????? Please consider the environment before printing this email
probe limma lumi probe limma lumi • 2.0k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 3 months ago
Australia/Melbourne/Olivia Newton-John …
Dear Jovana: Thanks for the very detailed report for the problems you have encountered with the processing of your beadchip data. Detection/Detection Pval outputted from Illumina BeadStudio/GenoStudio is always confusing (different versions of BeadStudio could give detection values in opposite directions). For the Illumina dataset used in the Limma User's Guide, an expressed probe has a small detection value (close to 0). This is the reason why probes with detection value less than 0.05 were selected as the expressed probes. You should always check the direction of detection values in order to filter out non-expressed probes correctly. You can use one of the arrays in your data to check this. Detection values of probes which have the largest intensities (or the smallest intensities) in your array should tell you the direction. BTW, the command "expressed = apply(x$other$Detection < 0.05, 1, any)" tells you which probes express in at least ONE array (if expressed probes have detection values close to 0). It does not give you the probes which express in ALL arrays. The purpose of probe filtering is to remove those probes which do not express in any of the arrays so as to improve the power to detect differentially expressed genes. Hope this helps. Cheers, Wei On Jan 5, 2011, at 4:55 PM, Jovana Maksimovic wrote: > Dear Bioconductor Users, > I am currently analysing some Illumina expression data (HumanWG- 6_V3,GenomeStudio v1.6), and have noticed an issue when comparing data processed using both Lumi and Limma. > I initially processed the data using Limma as described in the Limma User's Guide (p89). > > expressed = apply(x$other$Detection < 0.05, 1, any) > y = y[expressed,] ## y now contains only the probes expressed in ALL samples > = ~8500 probes > > However with lumi: > > raw = lumiR("file.txt") > norm = lumiExpresso(norm,QC.evaluation=TRUE) > x = exprs(norm) > presentCount = detectionCall(x) > y = x[presentCount > 0,] > =~20,000 > > When I compared the actual detection values in the Limma and Lumi objects, I found that they are actually different. The detection p-values in the limma object are the same as the text file directly from the GenomeStudio output, those values in the lumi object are the result of subtracting the raw values from 1. > Examples of the 3 different detection tables can be seen below. > This appears to be the relevant lumiR code: > > if (length(grep("Detection Pval", header, ignore.case = TRUE)) == 0) { > detection <- 1 - detection > } > > In the GenomeStudio file that I am working with the detection p-value column is labelled "Detection-4457260019_A" and so the function does not find the "Detection Pval" string and thus performs the conversion of the detection values. I confirmed that this is the case by renaming my detection headings to "Detection Pval- 4457260019_A", which resulted in the detection values not being converted and thus remaining equal to the "raw" and Limma detection values. > > According to the GenomeStudio v1.0 manual regarding detection p-values: > > If the Z score for the probe intensity is smaller than the lowest > negative control Z score, the function returns a 0 and the > p-value is 1. > If the Z score for the probe intensity falls within the range of the Z > scores of the negative controls, R is the rank of the Z score of the > probe, and the p-value is in the range of 0 to 1. > If the Z score for the probe intensity is greater than the largest > negative control Z score, the function returns a 1 and the > p-value is 0. > > This suggests that the detection p-value for an expressed probe should be close to 0 in data generated by current releases of GenomeStudio. I know that with some older versions of BeadStudio that the detection value for expressed probes was actually close to 1, and Lumi was built to take this to account; however, I do not see any reason why the detection values for our data should be converted, as they were generated by a relatively new version of GenomeStudio. I propose that Illumina has perhaps changed their column naming system and that this has not been reflected in Lumi as yet. This error can have a significant impact on people's results and I felt it was necessary to bring it to the group's attention. > > > ## Detection p-values as seen in the Limma object > $other > $Detection > 4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E > 6450255 0.27536 0.51647 0.06983 0.89065 0.46245 > 2570615 0.97233 0.97892 0.98682 0.98814 0.98814 > 6370619 0.89196 0.72727 0.86825 0.96706 0.88669 > 2600039 0.71014 0.02899 0.39921 0.14361 0.53491 > 2650615 0.85375 0.60079 0.88274 0.94071 0.40711 > 4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_D > 6450255 0.42161 0.55072 0.65613 0.22398 0.29117 > 2570615 0.97628 0.77339 0.98155 0.98287 0.98946 > 6370619 0.84848 0.85507 0.94993 0.98287 0.91963 > 2600039 0.21476 0.54414 0.46377 0.45982 0.32016 > 2650615 0.38603 0.92754 0.57312 0.57181 0.64559 > 4463361183_E 4463361183_F 5511070019_A 5511070019_B 5511070021_A > 6450255 0.42951 0.35705 0.25823 0.23979 0.31094 > 2570615 0.97892 0.99209 0.97760 0.99341 0.95652 > 6370619 0.77339 0.89855 0.78920 0.75362 0.43478 > 2600039 0.35968 0.23979 0.17391 0.40975 0.72596 > 2650615 0.49407 0.16996 0.57312 0.52306 0.46113 > 5511070021_B 5511070021_C 5511070021_D 5511070021_E 5511070021_F > 6450255 0.82213 0.48353 0.37681 0.26482 0.27536 > 2570615 0.98024 0.97497 0.95784 0.93412 0.96970 > 6370619 0.48748 0.59947 0.48880 0.50988 0.82213 > 2600039 0.20422 0.24769 0.49144 0.57049 0.43742 > 2650615 0.52306 0.37418 0.33202 0.39789 0.60079 > 49582 more rows ... > > ## Detection p-values as seen in the Lumi object >> detect[1:5,] > 4457260019_A 4457260019_B 4457260019_C 4457260019_D 4457260019_E > 6450255 0.72464 0.48353 0.93017 0.10935 0.53755 > 2570615 0.02767 0.02108 0.01318 0.01186 0.01186 > 6370619 0.10804 0.27273 0.13175 0.03294 0.11331 > 2600039 0.28986 0.97101 0.60079 0.85639 0.46509 > 2650615 0.14625 0.39921 0.11726 0.05929 0.59289 > 4457260019_F 4463361183_A 4463361183_B 4463361183_C 4463361183_E > 6450255 0.57839 0.44928 0.34387 0.77602 0.57049 > 2570615 0.02372 0.22661 0.01845 0.01713 0.02108 > 6370619 0.15152 0.14493 0.05007 0.01713 0.22661 > 2600039 0.78524 0.45586 0.53623 0.54018 0.64032 > 2650615 0.61397 0.07246 0.42688 0.42819 0.50593 > 4463361183_F 5511070019_A 5511070021_A 5511070021_B 5511070021_C > 6450255 0.64295 0.74177 0.68906 0.17787 0.51647 > 2570615 0.00791 0.02240 0.04348 0.01976 0.02503 > 6370619 0.10145 0.21080 0.56522 0.51252 0.40053 > 2600039 0.76021 0.82609 0.27404 0.79578 0.75231 > 2650615 0.83004 0.42688 0.53887 0.47694 0.62582 > 5511070021_D 5511070021_E 5511070021_F > 6450255 0.62319 0.73518 0.72464 > 2570615 0.04216 0.06588 0.03030 > 6370619 0.51120 0.49012 0.17787 > 2600039 0.50856 0.42951 0.56258 > 2650615 0.66798 0.60211 0.39921 > > ## Detection p-values read as a text file from GenomeStudio output >> raw.detect[1:5,] > Detection.4457260019_A Detection.4457260019_B Detection.4457260019_C > 6450255 0.27536 0.51647 0.06983 > 2570615 0.97233 0.97892 0.98682 > 6370619 0.89196 0.72727 0.86825 > 2600039 0.71014 0.02899 0.39921 > 2650615 0.85375 0.60079 0.88274 > Detection.4457260019_D Detection.4457260019_E Detection.4457260019_F > 6450255 0.89065 0.46245 0.42161 > 2570615 0.98814 0.98814 0.97628 > 6370619 0.96706 0.88669 0.84848 > 2600039 0.14361 0.53491 0.21476 > 2650615 0.94071 0.40711 0.38603 > Detection.4463361183_A Detection.4463361183_B Detection.4463361183_C > 6450255 0.55072 0.65613 0.22398 > 2570615 0.77339 0.98155 0.98287 > 6370619 0.85507 0.94993 0.98287 > 2600039 0.54414 0.46377 0.45982 > 2650615 0.92754 0.57312 0.57181 > Detection.4463361183_D Detection.4463361183_E Detection.4463361183_F > 6450255 0.29117 0.42951 0.35705 > 2570615 0.98946 0.97892 0.99209 > 6370619 0.91963 0.77339 0.89855 > 2600039 0.32016 0.35968 0.23979 > 2650615 0.64559 0.49407 0.16996 > Detection.5511070019_A Detection.5511070019_B Detection.5511070021_A > 6450255 0.25823 0.23979 0.31094 > 2570615 0.97760 0.99341 0.95652 > 6370619 0.78920 0.75362 0.43478 > 2600039 0.17391 0.40975 0.72596 > 2650615 0.57312 0.52306 0.46113 > Detection.5511070021_B Detection.5511070021_C Detection.5511070021_D > 6450255 0.82213 0.48353 0.37681 > 2570615 0.98024 0.97497 0.95784 > 6370619 0.48748 0.59947 0.48880 > 2600039 0.20422 0.24769 0.49144 > 2650615 0.52306 0.37418 0.33202 > Detection.5511070021_E Detection.5511070021_F > 6450255 0.26482 0.27536 > 2570615 0.93412 0.96970 > 6370619 0.50988 0.82213 > 2600039 0.57049 0.43742 > 2650615 0.39789 0.60079 > >> sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] lumi_2.2.1 Biobase_2.10.0 limma_3.6.9 > > loaded via a namespace (and not attached): > [1] affy_1.28.0 affyio_1.18.0 annotate_1.28.0 > [4] AnnotationDbi_1.12.0 DBI_0.2-5 grid_2.12.0 > [7] hdrcde_2.15 KernSmooth_2.23-4 lattice_0.19-13 > [10] MASS_7.3-8 Matrix_0.999375-44 methylumi_1.6.1 > [13] mgcv_1.7-0 nlme_3.1-97 preprocessCore_1.12.0 > [16] RSQLite_0.9-2 xtable_1.5-6 > > > Jovana Maksimovic B.Sc (Hons) / B.Binf > Bioinformatics Officer > Bioinformatics, Enabling Facilities > > Murdoch Childrens Research Institute > The Royal Children?s Hospital > Flemington Road Parkville Victoria 3052 Australia > E jovana.maksimovic at mcri.edu.au > www.mcri.edu.au > > This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication. P Please consider the environment before printing this email > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6