I am trying to use the GAGE pathway analysis tutorial with 3 samples of paired-end RNA-Seq reads (placed in a "tophat_all" directory) that I mapped with TopHat (.bam files have been indexed).
An error occurred at the read count step:
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> exByGn <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")
> library(GenomicAlignments)
> fls <- list.files("tophat_all/", pattern="bam$", full.names =T)
> bamfls <- BamFileList(fls)
> flag <- scanBamFlag(isSecondaryAlignment=FALSE, isProperPair=TRUE)
> param <- ScanBamParam(flag=flag)
> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param)
Error in validObject(.Object) :
invalid class "SummarizedExperiment0" object: 'assays' nrow differs from 'mcols' nrow
When I tried the same with only one sample I get a different error:
> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union", ignore.strand=TRUE, single.end=FALSE, param=param) Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of a vector type, was 'NULL'
Could someone explain me the error?
Does the error come from my reads (potential unpaired reads, different length...)?
Thanks !
Note:
(1) I am using the last version of R, Biconductor and its packages. Could it come from the newer versions of the packages that would not fit the tutorial anymore?
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicAlignments_1.6.1 [2] Rsamtools_1.22.0 [3] Biostrings_2.38.0 [4] XVector_0.10.0 [5] SummarizedExperiment_1.0.1 [6] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [7] GenomicFeatures_1.22.4 [8] AnnotationDbi_1.32.0 [9] Biobase_2.30.0 [10] GenomicRanges_1.22.1 [11] GenomeInfoDb_1.6.1 [12] IRanges_2.4.1 [13] S4Vectors_0.8.2 [14] BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] zlibbioc_1.16.0 BiocParallel_1.4.0 tools_3.2.2 [4] DBI_0.3.1 lambda.r_1.1.7 futile.logger_1.4.1 [7] rtracklayer_1.30.1 futile.options_1.0.0 bitops_1.0-6 [10] RCurl_1.95-4.7 biomaRt_2.26.0 RSQLite_1.0.0 [13] XML_3.98-1.3
(2) And the traceback after the error occurred:
> traceback() 26: stop(msg, ": ", errors, domain = NA) 25: validObject(.Object) 24: initialize(value, ...) 23: initialize(value, ...) 22: new("SummarizedExperiment0", NAMES = names, elementMetadata = rowData, colData = colData, assays = assays, metadata = as.list(metadata)) 21: new_SummarizedExperiment0(from@assays, names(from@rowRanges), mcols(from@rowRanges), from@colData, from@metadata) 20: asMethod(object) 19: as(object, superClass) 18: slot(x, "assays") 17: is(slot(x, "assays"), "Assays") 16: .valid.SummarizedExperiment0.assays_current(x) 15: valid.func(object) 14: validityMethod(as(object, superClass)) 13: anyStrings(validityMethod(as(object, superClass))) 12: validObject(.Object) 11: initialize(value, ...) 10: initialize(value, ...) 9: new("RangedSummarizedExperiment", rowRanges = rowRanges, colData = colData, assays = assays, elementMetadata = elementMetadata, metadata = as.list(metadata)) 8: .new_RangedSummarizedExperiment(assays, rowRanges, colData, metadata) 7: .local(assays, ...) 6: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, colData = colData) 5: SummarizedExperiment(assays = SimpleList(counts = counts), rowRanges = features, colData = colData) 4: .dispatchBamFiles(features, reads, mode, match.arg(algorithm), ignore.strand, inter.feature = inter.feature, singleEnd = singleEnd, fragments = fragments, param = param, preprocess.reads = preprocess.reads, ...) 3: .local(features, reads, mode, algorithm, ignore.strand, ...) 2: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, single.end = FALSE, param = param) 1: summarizeOverlaps(exByGn, bamfls, mode = "Union", ignore.strand = TRUE, single.end = FALSE, param = param)
Thanks Valerie for your reply! I have already tried 'singleEnd' yesterday and it returned the same error message as in my original post. Today, I have tried again and the script seems to work (started 2 hours ago and still running; which never happened before). Except checking the version of my Biocparallel and the man, I didn't do anything else compared to yesterday...strange. Anyway, fingers crossed ! Thanks !
If your bam files are large you can use BamFileList(..., yieldSize = 100000) to iterate through in chunks. Also try passing
to the summarizeOverlaps() call. This eliminates parallel evaluation and often makes it easier to identify where things are going wrong. Let me know if you still have problems and I can take a look at one of your files off-line.
Valerie
Thanks !
All the steps after works until calling GAGE package. Actually I cannot even install GAGE (and its dependencies 'png' and 'KEGGREST').
See Cannot install GAGE (and its dependencies): libpng16.so.16 not found
If your original question was about summarizeOverlaps, and this has been solved, and now you have a new question about libpng, use the 'ASK QUESTION' link to ask a new question. Be sure to be more specific about the flavor of linux you are using.