rtracklayer import.gff3 mangling scores
1
0
Entering edit mode
Tim Rayner ▴ 270
@tim-rayner-2913
Last seen 10.2 years ago
Hi, I've just run into what I think is a bug in the rtracklayer import.gff3 function (v1.16.1). If I import a GFF3 containing scores while stringsAsFactors=TRUE, the resulting scores are mangled. I haven't confirmed it, but I suspect the values are being converted to a factor upon import and then coerced to numeric (giving the factor level, not the original value). If I use options(stringsAsFactors=FALSE) the values remain intact. Best regards, Tim Rayner -- Bioinformatician Smith Lab, CIMR University of Cambridge United Kingdom Example GFF3 content: ##gff-version 3 ##date 2012-07-13 chr1 rtracklayer snp 189807684 189807684 0.20294398632582 * . ID=rs955894;name=rs955894 chr1 rtracklayer snp 198484784 198484784 0.269327708380075 * . ID=rs16843226;name=rs16843226 chr1 rtracklayer snp 237405093 237405093 0.379417274542624 * . ID=rs679735;name=rs679735 chr1 rtracklayer snp 80235819 80235819 0.418346673826376 * . ID=rs12022561;name=rs12022561 chr1 rtracklayer snp 84875173 84875173 0.302119655250906 * . ID=rs6576700;name=rs6576700 chr1 rtracklayer snp 112793146 112793146 0.390270490589027 * . ID=rs11102440;name=rs11102440 chr1 rtracklayer snp 244187847 244187847 0.249206080122631 * . ID=rs1000451;name=rs1000451 chr1 rtracklayer snp 8612104 8612104 0.583436890885292 * . ID=rs6577499;name=rs6577499 > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.16.1 GenomicRanges_1.8.6 IRanges_1.14.3 [4] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 [5] Rsamtools_1.8.5 stats4_2.15.1 tools_2.15.1 XML_3.9-4 [9] zlibbioc_1.2.0
SNP rtracklayer SNP rtracklayer • 976 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States
Hi Tim, Good catch. Added a test to catch that in the future. Fixed in 1.16.3 (and devel). Michael On Mon, Jul 16, 2012 at 6:25 AM, Tim Rayner <tfrayner@gmail.com> wrote: > Hi, > > I've just run into what I think is a bug in the rtracklayer > import.gff3 function (v1.16.1). If I import a GFF3 containing scores > while stringsAsFactors=TRUE, the resulting scores are mangled. I > haven't confirmed it, but I suspect the values are being converted to > a factor upon import and then coerced to numeric (giving the factor > level, not the original value). If I use > options(stringsAsFactors=FALSE) the values remain intact. > > Best regards, > > Tim Rayner > > -- > Bioinformatician > Smith Lab, CIMR > University of Cambridge > United Kingdom > > > > Example GFF3 content: > > ##gff-version 3 > ##date 2012-07-13 > chr1 rtracklayer snp 189807684 189807684 > 0.20294398632582 * . ID=rs955894;name=rs955894 > chr1 rtracklayer snp 198484784 198484784 > 0.269327708380075 * . ID=rs16843226;name=rs16843226 > chr1 rtracklayer snp 237405093 237405093 > 0.379417274542624 * . ID=rs679735;name=rs679735 > chr1 rtracklayer snp 80235819 80235819 > 0.418346673826376 * . ID=rs12022561;name=rs12022561 > chr1 rtracklayer snp 84875173 84875173 > 0.302119655250906 * . ID=rs6576700;name=rs6576700 > chr1 rtracklayer snp 112793146 112793146 > 0.390270490589027 * . ID=rs11102440;name=rs11102440 > chr1 rtracklayer snp 244187847 244187847 > 0.249206080122631 * . ID=rs1000451;name=rs1000451 > chr1 rtracklayer snp 8612104 8612104 0.583436890885292 > * . ID=rs6577499;name=rs6577499 > > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.16.1 GenomicRanges_1.8.6 IRanges_1.14.3 > [4] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] Biostrings_2.24.1 bitops_1.0-4.1 BSgenome_1.24.0 RCurl_1.91-1 > [5] Rsamtools_1.8.5 stats4_2.15.1 tools_2.15.1 XML_3.9-4 > [9] zlibbioc_1.2.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6