getGEO Error: Duplicate identifiers for rows
2
1
Entering edit mode
umahajan ▴ 10
@umahajan-15124
Last seen 6.8 years ago

Hi,

I am trying to get GEO datset GSE71989, but I am getting following error.

gset <- getGEO("GSE71989")
Found 1 file(s)
GSE71989_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE71nnn/GSE71989/matrix/GSE71989_series_matrix.txt.gz'
Content type 'application/x-gzip' length 4240450 bytes (4.0 MB)
==================================================
downloaded 4.0 MB

Error: Duplicate identifiers for rows (75, 83), (76, 84), (77, 85), (78, 86), (79, 87), (80, 88), (81, 89), (82, 90)

Please suggest me the solution.

 

 

 

geoquery getgeo • 1.5k views
ADD COMMENT
0
Entering edit mode

This is a bug due to a "feature" of this particular dataset in GEO (the same metadata key is used more than once per sample). I'll get a fix out in the next day or two. 

ADD REPLY
0
Entering edit mode

Should be fixed.

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I don't know why you get the error - there's a bunch of tidyverse blahblah in the code for GEOquery and I'm not cool enough to grok that stuff. Anyway, you don't have to use the GSE matrix data, you can just get the celfiles and process yourself.

> getGEOSuppFiles("GSE71989")
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE71nnn/GSE71989/suppl//GSE71989_RAW.tar?tool=geoquery'
Content type 'application/x-tar' length 122869760 bytes (117.2 MB)
downloaded 117.2 MB

> setwd("GSE71989/")
> untar("GSE71989_RAW.tar")
> library(oligo)

> fls <- dir(".", "CEL.gz")
> dat <- rma(read.celfiles(fls))

Loading required package: pd.hg.u133.plus.2
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM1849335_JJ-1.CEL.gz
Reading in : GSM1849336_JJ-2.CEL.gz
Reading in : GSM1849337_JJ-3.CEL.gz
Reading in : GSM1849338_JJ-4.CEL.gz
Reading in : GSM1849339_JJ-5.CEL.gz
Reading in : GSM1849340_JJ-6.CEL.gz
Reading in : GSM1849341_JJ-7.CEL.gz
Reading in : GSM1849342_JJ-8.CEL.gz
Reading in : GSM1849343_JJ-26.CEL.gz
Reading in : GSM1849344_JJ-27.CEL.gz
Reading in : GSM1849345_JJ-29.CEL.gz
Reading in : GSM1849346_JJ-31.CEL.gz
Reading in : GSM1849347_JJ-32.CEL.gz
Reading in : GSM1849348_JJ-34.CEL.gz
Reading in : GSM1849349_JJ-39.CEL.gz
Reading in : GSM1849350_JJ-43.CEL.gz
Reading in : GSM1849351_JJ-44.CEL.gz
Reading in : GSM1849352_JJ-45.CEL.gz
Reading in : GSM1849353_JJ-46.CEL.gz
Reading in : GSM1849354_JJ-47.CEL.gz
Reading in : GSM1849355_JJ-49.CEL.gz
Reading in : GSM1849356_JJ-50.CEL.gz
Background correcting
Normalizing
Calculating Expression

> dat
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 22 samples
  element names: exprs
protocolData
  rowNames: GSM1849335_JJ-1.CEL.gz GSM1849336_JJ-2.CEL.gz ...
    GSM1849356_JJ-50.CEL.gz (22 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM1849335_JJ-1.CEL.gz GSM1849336_JJ-2.CEL.gz ...
    GSM1849356_JJ-50.CEL.gz (22 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2
ADD COMMENT
1
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States

This was a bug. It should be fixed in the next version in the Bioc-Devel (2.47.18) and in Bioc-Release (2.46.15). You can install from the GEOquery github repository if you need a quicker solution. 

ADD COMMENT

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6