Question

Opening Illumina HT12 V3.0 Data from GEO

0

Entering edit mode

FL512 • 0

@fl512-22046

Last seen 5.5 years ago

I was essentially doing the same things posted previously.

1: https://support.bioconductor.org/p/70064/

2: https://support.bioconductor.org/p/102005/

Following the note posted on 1, I have successfully downloaded the data of my interest.

library(GEOquery)
data <- getGEO("GSE32894")[[1]]

Unfortunately, I got stuck when I was trying to read "GSE32894" with limma.

idata <- read.ilmn("GSE32894_non-normalized_308UCsamples.txt",probeid = "PROBE_ID",expr="SKBR")

The error shows as follows;

Error in readGenericHeader(fname, columns = expr, sep = sep) : 
  Specified column headings not found in file

I looked into the documentation (https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/read.ilmn), none of them worked out.

It should be great if you can give me any kind of suggestion to fix this problem.

Thank you.

limma normalization GEO • 1.7k views

ADD COMMENT • link 5.5 years ago FL512 • 0

0

Entering edit mode

In the above the quotation marks " are not correct; what is the actual command that you used?

ADD REPLY • link 5.5 years ago Martin Morgan 25k

0

Entering edit mode

Yes, you right. I am sorry for any confusion caused. I will change from reading to opening Illumina HT12 V3.0 Data from GEO.

ADD REPLY • link 5.5 years ago FL512 • 0

0

Entering edit mode

OP has copied Mark Dunning's code (which was for a specific dataset) from https://support.bioconductor.org/p/70064. I reformated OP's question before, now I've removed the extra quote mark as well.

ADD REPLY • link 5.5 years ago Gordon Smyth 52k

score 0 · Answer 1 · 2019-10-03

In your first code chunk you are reading in the wrong GSE (GSE32849), which is a CHiP-Chip experiment, rather than the one you want. In the second case, you have a text file containing something that isn't what you think it is:

sed -n '5p' GSE32894_non-normalized_308UCsamples.txt | sed 's/\t/\n/g' | head
ID_REF
UC_0001_1
UC_0001_1.detection.p.value
UC_0002_1
UC_0002_1.detection.p.value
UC_0003_1
UC_0003_1.detection.p.value
UC_0006_2
UC_0006_2.detection.p.value
UC_0007_1

You are probably better off just using the data you get from getGEO:

 z <- getGEO("GSE32894")[[1]]
> z
ExpressionSet (storageMode: lockedEnvironment)
assayData: 24402 features, 308 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM814052 GSM814053 ... GSM814359 (308 total)
  varLabels: title geo_accession ... tumor_stage:ch1 (55 total)
  varMetadata: labelDescription
featureData
  featureNames: ILMN_1343291 ILMN_1343295 ... ILMN_2415979 (24402
    total)
  fvarLabels: ID nuID ... GB_ACC (30 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 22553347 
Annotation: GPL6947 
> pData(z)[1:5,54:55]
          tumor_grade:ch1 tumor_stage:ch1
GSM814052              G3              T2
GSM814053              G2              T2
GSM814054              G2              T1
GSM814055              G2              T1
GSM814056              G3             T3b

> table(pData(z)[,54:55])
               tumor_stage:ch1
tumor_grade:ch1 T1 T2 T2a T2b T3 T3b T4a Ta Tx
             G1  0  0   0   0  0   0   0 48  0
             G2 35  9   1   0  1   0   0 56  1
             G3 61 73   0   2  0   5   1 11  1
             G4  1  0   0   0  0   0   0  0  0
             Gx  0  0   0   0  0   1   0  1  0

You can just use limma directly on that ExpressionSet, based on whatever phenotypic groups you care to compare.