I've been experiencing some trouble attempting to use the function crlmmIllumina for preprocessing and genotyping of Illumina QC Array-24 chip using the CRLMM algorithm. According to the current crlmm
documentation, if there is no bioconductor annotation package for your array, you should still be able to import the data with a properly formatted anno
data.frame.
From the help page for genotype.Illumina()
:
"In general, a chip specific annotation package is required to use the genotype.Illumina function. If this is not available (newer chip types or custom chips often don't have a chip-specific package available on Bioconductor), consider using cdfName='nopackage' and specifying anno and genome, which runs 'krlmm' on the samples available. Here anno is a data.frame read in from the relevant chip-specific manifest, which must have additional columns 'isSnp' which is a logical that indicates whether a probe is polymorphic or not, 'position', 'chromosome' and 'featureNames' that give the location on the chromosome and SNP name."
I've prepared my anno
data.frame:
> head(manifest)
chromosome position featureNames isSnp IlmnID Name IlmnStrand SNP AddressA_ID AlleleA_ProbeSeq AddressB_ID
1 1 159174749 1:159174749-C-T TRUE 1:159174749-C-T-0_B_F_2304232049 1:159174749-C-T BOT [T/C] 65600245 NA 0
2 1 159174749 1:159175193-A-G TRUE 1:159175193-A-G-0_B_R_2304232052 1:159175193-A-G BOT [T/C] 13658935 NA 0
3 1 159174749 1:159175211-C-T TRUE 1:159175211-C-T-0_T_R_2304232054 1:159175211-C-T TOP [A/G] 14755267 NA 0
4 1 159174749 1:159175253-G-A TRUE 1:159175253-G-A-0_T_F_2304232055 1:159175253-G-A TOP [A/G] 78702422 NA 0
5 1 159174749 1:159175495-G-A TRUE 1:159175495-G-A-0_T_F_2304232061 1:159175495-G-A TOP [A/G] 73657552 NA 0
6 1 159174749 1:159175540-TC TRUE 1:159175540-TC-0_T_R_2299219123 1:159175540-TC TOP [A/G] 19715188 NA 0
AlleleB_ProbeSeq Chr MapInfo Ploidy Species CustomerStrand IlmnStrand_1 IllumicodeSeq TopGenomicSeq
1 NA 1 159204959 diploid Homo sapiens BOT BOT NA NA
2 NA 1 159205403 diploid Homo sapiens TOP BOT NA NA
3 NA 1 159205421 diploid Homo sapiens BOT TOP NA NA
4 NA 1 159205463 diploid Homo sapiens TOP TOP NA NA
5 NA 1 159205705 diploid Homo sapiens TOP TOP NA NA
6 NA 1 159205750 diploid Homo sapiens BOT TOP NA NA
and i make my call to genotype.Illumina()
as follows:
crlmmResult = genotype.Illumina(path = "../path/to/idat_files",
arrayNames= NULL,
sep = "_",
highDensity = F,
fileExt=list(green="Grn.idat", red="Red.idat"),
cdfName= 'nopackage',
call.method = "krlmm",
anno = manifest,
genome = "hg19")
Despite the documentation suggesting that anno
should be a:
"data.frame containing SNP annotation information from manifest and additional columns 'isSnp', 'position', 'chromosome' and 'featureNames'. For use when cdfName='nopackage''
it still throws the following error:
Instantiate CNSet container.
Initializing container for genotyping and copy number estimation
Processing sample stratum 1 of 1
Error in colnames(anno@data) :
trying to get slot "data" from an object (class "data.frame") that is not an S4 object
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[8] ggplot2_3.2.1 tidyverse_1.2.1 crlmm_1.43.0 preprocessCore_1.47.1 oligoClasses_1.47.0 ff_2.2-14 bit_1.1-14
[15] Biobase_2.45.1 BiocGenerics_0.31.6
loaded via a namespace (and not attached):
[1] nlme_3.1-139 bitops_1.0-6 matrixStats_0.55.0 lubridate_1.7.4 httr_1.4.1
[6] GenomeInfoDb_1.21.2 tools_3.6.0 backports_1.1.5 utf8_1.1.4 R6_2.4.0
[11] affyio_1.55.0 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1 withr_2.1.2
[16] tidyselect_0.2.5 base64_2.0 compiler_3.6.0 cli_1.1.0 rvest_0.3.4
[21] xml2_1.2.2 DelayedArray_0.11.8 scales_1.0.0 mvtnorm_1.0-11 askpass_1.1
[26] illuminaio_0.27.1 XVector_0.25.0 pkgconfig_2.0.3 limma_3.41.18 rlang_0.4.0
[31] readxl_1.3.1 rstudioapi_0.10 VGAM_1.1-1 generics_0.0.2 jsonlite_1.6
[36] BiocParallel_1.19.4 RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.1 Matrix_1.2-17
[41] Rcpp_1.0.2 munsell_0.5.0 S4Vectors_0.23.25 fansi_0.4.0 lifecycle_0.1.0
[46] stringi_1.4.3 SummarizedExperiment_1.15.9 zlibbioc_1.31.0 grid_3.6.0 crayon_1.3.4
[51] lattice_0.20-38 Biostrings_2.53.2 haven_2.1.1 splines_3.6.0 hms_0.5.1
[56] zeallot_0.1.0 knitr_1.25 beanplot_1.2 pillar_1.4.2 GenomicRanges_1.37.17
[61] codetools_0.2-16 stats4_3.6.0 glue_1.3.1 BiocManager_1.30.8 modelr_0.1.5
[66] vctrs_0.2.0 foreach_1.4.7 cellranger_1.1.0 gtable_0.3.0 openssl_1.4.1
[71] assertthat_0.2.1 xfun_0.10 broom_0.5.2 RcppEigen_0.3.3.5.0 iterators_1.0.12
[76] IRanges_2.19.17 ellipse_0.4.1
This all suggests to me that anno
cannot actually be just a data.frame and genotype.Illumina()
in fact does in fact require a S4 annotation object created with a package for custom and/or currently unsupported arrays rather than accepting a data.frame as the annotation suggests.
I'd love some help to get down to the bottom of this, as I REALLY want to avoid using the super clunky GenomeStudio software so I can fully automate my genotyping process.
Thanks in advance, Dean
Hello! I'm trying to use crlmm in the same way and the algorithm throws me the same error.
Did you find any solution?
Thanks, Valeria
unfortunately, I haven't found any solutions yet. At this point, i think the only option is to write your own annotation package, which i have not yet found time to do. I wish I had a better answer, but it seems like
crlmm
is not actively supported right now.best of luck, Dean
unfortunately, I haven't found any solutions yet. At this point, i think the only option is to write your own annotation package, which i have not yet found time to do. I wish I had a better answer, but it seems like
crlmm
is not actively supported right now.best of luck, Dean