Question

Reading Illumina HT12 V4.0 Data from GEO into Lumi

0

Entering edit mode

aaronrosenstein • 0

@aaronrosenstein-13457

Last seen 7.3 years ago

Hello,

I am relatively new to preprocessing microarray data, and am trying to analyze the GEO dataset "GSE56045". I downloaded the supplementary RAW files to manipulate with lumi, however the file format does not seem to be compatible with the lumiR function. The header of the RAW file is as follows, if this helps:

? Illumina, Inc.
[Heading]
Date   15/4/2010
ContentVersion   4.0
FormatVersion   1.0.0
Number of Probes   47231
Number of Controls   887
[Probes]

When i call the lumiR function, the error message is:

"Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds"

This confuses me because the file appears to be a tab separated document.

Is this data in a format readable by lumi? should I use a different package instead?

lumi geoquery geo illumina human ht-12 v4 gene expression • 2.7k views

ADD COMMENT • link updated 7.3 years ago by Gordon Smyth 52k • written 7.3 years ago by aaronrosenstein • 0

score 1 · Answer 1 · 2017-10-24

The naming of the GEO series supplementary files is somewhat misleading. I guess you are trying to read the file GSE56045_RAW.tar, but that actually contains Illumina Bead Manifest files, which give probe annotation rather than expression data. The raw expression data is instead in the file GSE56045_non_normalized.txt.gz.

I was able to read the data using the limma package:

> library(limma)
> x <- read.ilmn("GSE56045_non_normalized.txt.gz",probeid="ID_REF",expr="intensity",other.columns="detection")
Reading file GSE56045_non_normalized.txt.gz ... ...
> dim(x)
[1] 48164  1202
> x[1:5,1:5]
An object of class "EListRaw"
$source
[1] "illumina"

$E
               100001    100002   100003   100004   100005
ILMN_1762337 26.40536  28.34256 61.83844 32.21310 11.21891
ILMN_2055271 49.77552 104.60300 94.35043 58.13754 42.71157
ILMN_1736007 28.54197  36.64471 34.84822 26.64572 16.58674
ILMN_2383229 36.51273  16.37690 45.85955 30.05022  6.72389
ILMN_1806310 23.35780  21.99633 52.21932 31.46063 18.65642

$other
$detection
                  100001    100002      100003      100004      100005
ILMN_1762337 0.349350700 0.4285714 0.227272700 0.225974000 0.668831200
ILMN_2055271 0.006493506 0.0000000 0.009090909 0.003896104 0.005194805
ILMN_1736007 0.266233800 0.1870130 0.779220800 0.436363600 0.327272700
ILMN_2383229 0.075324680 0.8922078 0.546753200 0.307792200 0.916883100
ILMN_1806310 0.472727300 0.7000000 0.406493500 0.251948100 0.244155800

The data can then be background corrected and normalized by neqc() using the detection p-values:

> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the detection p-values.

Note that this is how Reynolds et al (2014) processed the data also, as you can read from the description of the data processing on GEO.