Question

read.idat and empty Symbols

0

Entering edit mode

h.mon • 0

@hmon-8976

Last seen 4.2 years ago

Brazil

I am reading Illumina Human HT-12 v4 Expression BeadChip with read.idat from the limma package. While the reading works apparently without problems, the resulting object has lots (3270, to be exact) of empty strings for genes$Symbols.

What may be causing this?

> idatfiles <- list.files( path = "../array", pattern = ".idat$", full.names = TRUE )
> bgxfile <- list.files( path = "../array", pattern = ".bgx$", full.names = TRUE )
> x <- read.idat( idatfiles, bgxfile, dateinfo = T )

> length( which( y$genes$Symbol == "", arr.ind = F ) )
[1] 3270
> y$genes[8446,]
         Probe_Id Array_Address_Id Symbol
8446 ILMN_1906423          5310327

And here is one example of the correnponding annotation from the bgx file:

Homo sapiens    Unigene    Hs.390407    ILMN_89369    HS.390407    Hs.390407        Hs.390407        27828963    BX097705            ILMN_1906423    0005310327    S    640    GAGAGGCAGGGTGAAGAGGTCGAAGGAGCCTGAGTTAGCAGGGATGAGCA    2    -    87520225-87520274        BX097705 NCI_CGAP_Kid5 Homo sapiens cDNA clone IMAGp998E053890, mRNA sequence

> sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] GO.db_3.3.0                SummarizedExperiment_1.2.3 GenomicRanges_1.24.3
[4] GenomeInfoDb_1.8.7         RColorBrewer_1.1-2         pheatmap_1.0.8
[7] ggplot2_2.2.0              pathview_1.12.0            gage_2.22.0
[10] org.Hs.eg.db_3.3.0         AnnotationDbi_1.34.4       IRanges_2.6.1
[13] S4Vectors_0.10.3           Biobase_2.32.0             BiocGenerics_0.18.0
[16] illuminaio_0.14.0          limma_3.28.21

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8        plyr_1.8.4         XVector_0.12.1     tools_3.3.2
[5] zlibbioc_1.18.0    digest_0.6.10      base64_2.0         RSQLite_1.1
[9] memoise_1.0.0      tibble_1.2         gtable_0.2.0       png_0.1-7
[13] KEGGgraph_1.30.0   graph_1.50.0       DBI_0.5-1          Rgraphviz_2.16.0
[17] curl_2.3           httr_1.2.1         Biostrings_2.40.2 grid_3.3.2
[21] R6_2.2.0           XML_3.98-1.5       org.Bt.eg.db_3.3.0 scales_0.4.1
[25] KEGGREST_1.12.3    assertthat_0.1     colorspace_1.3-1   openssl_0.9.5
[29] lazyeval_0.2.0     munsell_0.4.3

illuminaio limma • 1.2k views

ADD COMMENT • link 8.1 years ago • updated 4.2 years ago h.mon • 0

score 1 · Answer 1 · 2016-12-05

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 days ago

United States

Not everything has a HUGO symbol - those are reserved for transcripts that are considered to be an actual thing. In other words, the example you have put forth is an IMAGE clone, which is at present a hypothetical transcript that may or may not get transcribed in humans. This particular clone was uploaded to GenBank in 2003 and hasn't really been updated since, so I would bet nobody has ever detected it in the wild, so it just persists as a hypothetical in NCBI's databases.

ADD COMMENT • link 8.1 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks. I will investigate some other probes, but now I am less anxious abut the subject.

ADD REPLY • link 8.1 years ago h.mon • 0