GAGE pathway analysis (R/Bioconductor): summarizeOverlaps error
1
0
Entering edit mode
user31888 ▴ 30
@user31888-9209
Last seen 5.8 years ago
United States

I am trying to use the GAGE pathway analysis tutorial with 3 samples of paired-end RNA-Seq reads (placed in a "tophat_all" directory) that I mapped with TopHat (.bam files have been indexed).

An error occurred at the read count step:

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)

> exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")

> library(GenomicAlignments)

> fls <- list.files("tophat_all/", pattern="bam$", full.names =T)

> bamfls <- BamFileList(fls)

> flag <- scanBamFlag(isSecondaryAlignment=FALSE, isProperPair=TRUE)

> param <- ScanBamParam(flag=flag)

> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)

Error in validObject(.Object) : 
  invalid class "SummarizedExperiment0" object: 'assays' nrow differs from 'mcols' nrow

 

When I tried the same with only one sample I get a different error:

> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)

Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

Could someone explain me the error?

Does the error come from my reads (potential unpaired reads, different length...)?

Thanks !

Note:

(1) I am using the last version of R, Biconductor and its packages. Could it come from the newer versions of the packages that would not fit the tutorial anymore?

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicAlignments_1.6.1                
 [2] Rsamtools_1.22.0                       
 [3] Biostrings_2.38.0                      
 [4] XVector_0.10.0                         
 [5] SummarizedExperiment_1.0.1             
 [6] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [7] GenomicFeatures_1.22.4                 
 [8] AnnotationDbi_1.32.0                   
 [9] Biobase_2.30.0                         
[10] GenomicRanges_1.22.1                   
[11] GenomeInfoDb_1.6.1                     
[12] IRanges_2.4.1                          
[13] S4Vectors_0.8.2                        
[14] BiocGenerics_0.16.1                    

loaded via a namespace (and not attached):
 [1] zlibbioc_1.16.0      BiocParallel_1.4.0   tools_3.2.2         
 [4] DBI_0.3.1            lambda.r_1.1.7       futile.logger_1.4.1 
 [7] rtracklayer_1.30.1   futile.options_1.0.0 bitops_1.0-6        
[10] RCurl_1.95-4.7       biomaRt_2.26.0       RSQLite_1.0.0       
[13] XML_3.98-1.3

(2) And the traceback after the error occurred:

> traceback()
26: stop(msg, ": ", errors, domain = NA)
25: validObject(.Object)
24: initialize(value, ...)
23: initialize(value, ...)
22: new("SummarizedExperiment0", NAMES = names, elementMetadata = rowData, 
        colData = colData, assays = assays, metadata = as.list(metadata))
21: new_SummarizedExperiment0(from@assays, names(from@rowRanges), 
        mcols(from@rowRanges), from@colData, from@metadata)
20: asMethod(object)
19: as(object, superClass)
18: slot(x, "assays")
17: is(slot(x, "assays"), "Assays")
16: .valid.SummarizedExperiment0.assays_current(x)
15: valid.func(object)
14: validityMethod(as(object, superClass))
13: anyStrings(validityMethod(as(object, superClass)))
12: validObject(.Object)
11: initialize(value, ...)
10: initialize(value, ...)
9: new("RangedSummarizedExperiment", rowRanges = rowRanges, colData = colData, 
       assays = assays, elementMetadata = elementMetadata, metadata = as.list(metadata))
8: .new_RangedSummarizedExperiment(assays, rowRanges, colData, metadata)
7: .local(assays, ...)
6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
       colData = colData)
5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, 
       colData = colData)
4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm), 
       ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd, 
       fragments = fragments, param = param, preprocess.reads = preprocess.reads, 
       ...)
3: .local(features, reads, mode, algorithm, ignore.strand, ...)
2: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
       single.end = FALSE, param = param)
1: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, 
       single.end = FALSE, param = param)
summarizeoverlaps GAGE • 3.1k views
ADD COMMENT
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.8 years ago
United States

Hi,

The argument should be 'singleEnd', not 'single.end'. The failure was occurring in bplappy() but the error message was not propagated due to how BiocParallel was handling errors, specifically the behavior of 'stop.on.error'.

Error handling has been changed in devel (BiocParallel 1.5.1) to throw an error when 'stop.on.error = TRUE' (default). See ?SnowParm or ?MulticoreParam for details.

With BiocParallel >= 1.5.1 you'll see this error message:

> summarizeOverlaps(exByGn, bamfls, mode="Union", 
+     ignore.strand=TRUE, single.end=FALSE, 
+     param=param)
ERROR [2015-11-18 08:41:29] unused argument (single.end = FALSE) 
error in task  1
Error in bplapply(X, FUN, ..., BPREDO = BPREDO, BPPARAM = BPPARAM) : 
  error in bplapply(): list(message = "unused argument (single.end = FALSE)", call = FUN(...))

Valerie

ADD COMMENT
0
Entering edit mode

Thanks Valerie for your reply! I have already tried 'singleEnd' yesterday and it returned the same error message as in my original post. Today, I have tried again and the script seems to work (started 2 hours ago and still running; which never happened before). Except checking the version of my Biocparallel and the man, I didn't do anything else compared to yesterday...strange. Anyway, fingers crossed ! Thanks !

ADD REPLY
0
Entering edit mode

If your bam files are large you can use BamFileList(..., yieldSize = 100000) to iterate through in chunks. Also try passing 

BPPARAM = SerialParam()

to the summarizeOverlaps() call. This eliminates parallel evaluation and often makes it easier to identify where things are going wrong. Let me know if you still have problems and I can take a look at one of your files off-line.

Valerie

ADD REPLY
0
Entering edit mode

Thanks !

All the steps after works until calling GAGE package. Actually I cannot even install GAGE (and its dependencies 'png' and 'KEGGREST').

See Cannot install GAGE (and its dependencies): libpng16.so.16 not found

ADD REPLY
0
Entering edit mode

If your original question was about summarizeOverlaps, and this has been solved, and now you have a new question about libpng, use the 'ASK QUESTION' link to ask a new question. Be sure to be more specific about the flavor of linux you are using.

ADD REPLY

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6