Error when prepare the TCGAdata
1
0
Entering edit mode
@michaelwxf2012-14463
Last seen 7.1 years ago

I use package TCGAbiolinks to get access to the DNA methylation data from TCGA. The problem occurs at the prepare part. I have download the data successfully.

#Load Package
  library("TCGAbiolinks")
#TCGA query
  LUAD.met<-GDCquery(project = "TCGA-LUAD",
                     legacy = TRUE,
                     data.category = "DNA methylation",
                     platform = "Illumina Human Methylation 450")
#Download
  GDCdownload(LUAD.met, method = "api", files.per.chunk = 5)

#Prepare
  LUAD.methy<-GDCprepare(LUAD.met)

The error shows:

Error: cannot allocate vector of size 3.7 Mb.

Traceback() as followed,

8: copy(ans[[target]])
7: `[.data.table`(y, x, nomatch = if (all.x) NA else 0, on = by, 
       allow.cartesian = allow.cartesian)
6: y[x, nomatch = if (all.x) NA else 0, on = by, allow.cartesian = allow.cartesian]
5: merge.data.table(df, data, by = "Composite.Element.REF")
4: merge(df, data, by = "Composite.Element.REF")
3: merge(df, data, by = "Composite.Element.REF")
2: readDNAmethylation(files, query$results[[1]]$cases, summarizedExperiment, 
       unique(query$platform))
1: GDCprepare(LUAD.met)

sesseionInfo() as followed,

R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936  LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] nycflights13_0.2.2 dplyr_0.7.4        bindrcpp_0.2       TCGAbiolinks_2.6.1

I couldnt understand the error meaning if my downloading files may have some errors since there are a large number of files. I hope some professionals can help me dealing with it. Thanks.

Michael Wang

 
tcgabiolinks dna methylation • 2.1k views
ADD COMMENT
1
Entering edit mode
@marcel-ramos-7325
Last seen 5 weeks ago
United States

Hi Michael Wang,

This seems to be a memory issue.

Methylation data is quite large and may exhaust your memory resources when trying to merge the various files that you've downloaded.

The total size of the files is ~10.8 Gb.

Downloading data for project TCGA-LUAD
GDCdownload will download 507 files. A total of 10.777088989 GB

Regards, Marcel

ADD COMMENT
0
Entering edit mode

Hi Marcel,

Thank you for your answering. How can I fix this problem? Delete the data and download again?

Yours,

Michael

ADD REPLY
1
Entering edit mode

How much memory (RAM) do you have. The problem as Marcel pointed out is that R was not able to read the data to memory.

You would have to read in chunks then try to bind the objects.

Something like:

#Load Package
library("TCGAbiolinks")
#TCGA query
LUAD.met<-GDCquery(project = "TCGA-LUAD",
                   legacy = TRUE,
                   data.category = "DNA methylation",
                   platform = "Illumina Human Methylation 450")
#Download
GDCdownload(LUAD.met, method = "api", files.per.chunk = 5)

samples <- getResults(LUAD.met)$cases
nsamples <- length(getResults(LUAD.met)$cases)
step <- 50
for(start in  seq(1,nsamples,step)) {
  end <- start + step
  if(end > nsamples) end <- nsamples
  LUAD.met<-GDCquery(project = "TCGA-LUAD",
                     legacy = TRUE,
                     barcode = samples[start:end],
                     data.category = "DNA methylation",
                     platform = "Illumina Human Methylation 450")
  GDCdownload(LUAD.met, method = "api", files.per.chunk = 5)
  LUAD.methy<-GDCprepare(LUAD.met,save = T,save.filename = paste0(start,".rda"))
}
LUAD.met <- NULL
for(i  in seq(1,nsamples,step)) {
  print(i)
  if(is.null(LUAD.met)) {
    LUAD.met <- get(load(paste0(i,".rda")))
  } else {
    aux <- get(load(paste0(i,".rda")))
    aux <- aux[rownames(LUAD.met)]
    LUAD.met <- SummarizedExperiment::cbind(LUAD.met,aux)
  }
}
#Prepare

 

ADD REPLY
0
Entering edit mode

Hi dear friend,

My laptop is not professional for bioinformatics with a RAM of 4GB and Core i5 7200U. I have got your meaning and I really appreciate your kind reply. I can understand this script is trying to help download and prepare the cases 50 by 50 so that my laptop can handle.

I will try it as soon as possible and will let you know the outcome at first time. This reply is just to express my sincere appreciation for your warmly help. Thanks very much.

Michael

ADD REPLY
0
Entering edit mode

I'm not sure we will be able to handle all the DNA methylation data with only 4GB. Probably the step in the code that tries to merge the objects will give you the same problem.

ADD REPLY

Login before adding your answer.

Traffic: 364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6