Hi,
I am trying to run SGSeq over my data and I have issues with the analyzeFeatures method. I prepared both my data with getBamInfo() and built my annotations with importTranscripts() method using the GTF file that I used for alignment with tophat 2. At the analyzeFeatures() step, I get this:
> sgfc_ncbi<-analyzeFeatures(data.r,features=sgfdb,cores=8)
Obtain counts...
Erreur :
Error in getSGFeatureCountsPerSample for MONOCYTES_169377:
Error in value[[3L]](cond) : 'scanBam' failed:
record: 0
error: 0
file: /path/toward/myBamFiles/accepted_hits_properly_paired.bam
index: /path/toward/myBamFiles/accepted_hits_properly_paired.bam
And so on for all my samples... These are paired-end BAM files, as checked via getBamInfo() and they do have the XS tag so I am at a lost right now.
Just in case:
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)
locale:
[1] LC_CTYPE=fr_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_CA.UTF-8 LC_COLLATE=fr_CA.UTF-8
[5] LC_MONETARY=fr_CA.UTF-8 LC_MESSAGES=fr_CA.UTF-8
[7] LC_PAPER=fr_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] SGSeq_1.4.3 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3
[4] IRanges_2.4.8 S4Vectors_0.8.11 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] igraph_1.0.1 AnnotationDbi_1.32.3
[3] XVector_0.10.0 magrittr_1.5
[5] zlibbioc_1.16.0 GenomicAlignments_1.6.3
[7] BiocParallel_1.4.3 tools_3.2.0
[9] SummarizedExperiment_1.0.2 Biobase_2.30.0
[11] DBI_0.4-1 lambda.r_1.1.9
[13] futile.logger_1.4.3 rtracklayer_1.30.4
[15] futile.options_1.0.0 bitops_1.0-6
[17] RUnit_0.4.31 biomaRt_2.26.1
[19] RCurl_1.95-4.8 RSQLite_1.0.0
[21] GenomicFeatures_1.22.13 Biostrings_2.38.4
[23] Rsamtools_1.22.0 XML_3.98-1.4
Any idea?
Best regards
Sylvain Foisy, Ph. D.
Montreal Heart Institute
Montreal, Qc
Hi Leonard,
Yes take 1, indeed my alignements were made with annotations from Illumina's iGenome data from NCBI; I used "seqlevelsStyle(txdb) <- "NCBI"" as specified elsewhere to go over this problem. Yes, take 2, you are right it ain't the latest but as of now, I am still stuck using this version for a while... I am trying the same code on a newer version on my Mac.
Is SGSeq using the BAI file created via samtools? I ask this because of the line "index: " which is pointing toward the BAM file, not its index... What is the scanBam method actually doing that might fail?
Best regards
S
Hi Leonard,
Ok, I looked into running elsewhere and well, I have errors of a different kind. I build a smaller version of my analysis (2 BAM per condition with only 2 conditions) and using the latest R/Bioconductor on my MacBook. Using the same pipeline, I now get these:
Obtain counts...
Erreur :
Error in getSGFeatureCountsPerSample for MONOCYTES_169377:
Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), :
seqlevels(param) not in BAM header:
seqlevels: ‘chr1_GL383518v1_alt’, ‘chr1_GL383519v1_alt’, ‘chr1_GL383520v2_alt’, ‘chr1_KI270759v1_alt’, ‘chr1_KI270761v1_alt’, ‘chr1_KI270762v1_alt’, ‘chr1_KI270763v1_alt’, ‘chr1_KI270765v1_alt’, ‘chr1_KI270766v1_alt’, ‘chr1_KI270892v1_alt’, ‘chr2_GL383521v1_alt’, ‘chr2_GL383522v1_alt’, ‘chr2_GL582966v2_alt’, ‘chr2_KI270767v1_alt’, ‘chr2_KI270768v1_alt’, ‘chr2_KI270769v1_alt’, ‘chr2_KI270770v1_alt’, ‘chr2_KI270772v1_alt’, ‘chr2_KI270773v1_alt’, ‘chr2_KI270774v1_alt’, ‘chr2_KI270776v1_alt’, ‘chr2_KI270893v1_alt’, ‘chr2_KI270894v1_alt’, ‘chr3_GL383526v1_alt’, ‘chr3_JH636055v2_alt’, ‘chr3_KI270777v1_alt’, ‘chr3_KI270779v1_alt’, ‘chr3_KI270780v1_alt’, ‘chr3_KI270782v1_alt’, ‘chr3_KI270783'
This tells you something interesting? FYI, I did the original alignments using tophat 2 with the Illumina's iGenomes human GRCh38-based indexes and GTF files.
Best regards
S
Hi Sylvain,
Yes the more recent Bioconductor installation gives a more informative error message. It tells you that the errors are due to chromosome names in your annotation that cannot be found in the BAM file header. If you remove the problematic chromosomes from your annotation, this should fix the problem, e.g. for your data you can use something like
> seqlevels_in_bam <- seqlevels(BamFile(data.r$file_bam[1]))
> sgfdb2 <- keepSeqlevels(sgfdb, seqlevels_in_bam)
and then run analyzeFeatures() using sgfdb2.
Regarding your earlier question, the scanBam() function has an argument 'index' which can be used to specify the name of the index file. By default this is simply the name of the BAM file with extension '.bai' appended.
I hope this answers your questions, let me know if you run into any other problems.
Best wishes
Leonard
Hi Leonard,
Success!! Your last inputs made the thing work ;-) I guess that I will have to make some preprocessing first on this criteria.
Thanks for all your help. Can't promise I won't need additional help in a few hours/days ;-)
S