Question about neqc and GEO illumina files
1
0
Entering edit mode
@akridgerunner-7719
Last seen 8.5 years ago
United States

Hello, I'm trying run raw data from GEO, GSE49454:

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49454

GSE49454_RAW.tar contains:
GPL10558_HumanHT-12_V4_0_R1_15002873_B.txt
GPL10558_HumanHT-12_V4_0_R2_15002873_B.txt

Both of these are probe description files without any signal data. R1 lists 47,231 probes and 887 controls, while R2 lists 47,323 probes and 887 controls.

GSE49454_non-normalized.txt.gz contains 47,323 rows of values, and repeating column pairs of probe and probe detection p values. We assume we’re using the R2 probe description file, but missing is expression data for the 887 control probes.

Don't we need this mysterious missing file of control probe intensities to use neqc? I also looked at GSE72535, but there was no listing for control probes either. Are these not usually uploaded then? Thus is neqc not really used for Illumina GEO files?

Thanks,

Robert

neqc illumina geo limma • 2.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 15 hours ago
WEHI, Melbourne, Australia

The help page for 'neqc' says: "When expression values for negative controls are not available, the detection.p argument is used instead."

Unfortunately GEO does not encourage people to upload the raw data files exported from GenomeStudio, which means that one can't simply run read.ilmn() on the data files. Nevertheless one can usually manage. For GSE72535, one can proceed:

> library(limma)
> dat <- read.delim("GSE72535_non-normalized.txt",skip=4,sep="\t",row.names=1)
> j <- 2*(1:17)
> x <- new("EListRaw")
> x$E <- as.matrix(dat[,j-1])
> x$other$Detection <- as.matrix(dat[,j])
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the detection p-values.

Unfortunately GSE49454 is impossible to process because the raw data file that is provided is missing about a third of the required data columns. It has 255 columns when there should be 177*2 = 354. It would be reasonable to write to the authors and complain.

> x <- read.delim("GSE49454_non-normalized.txt",sep="\t",row.names=1)
> dim(x)
[1] 47323   255

 

In general, you can use neqc() on GEO Illumina data if any of the following are true:

  1. The authors upload the control probe profiles exported from GenomeStudio. (This is rare.)
  2. The authors upload the binary IDAT files for each sample. This is now encouraged by GEO, but was not so in the past. See ?read.idat for how to use IDAT files.
  3. The expression data includes detection p-values, which is so for most GEO submissions.

 

ADD COMMENT

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6