Hi,
I encountered segfault when calling featurecounts with Ensembl GRCh37 gtf file. In my dataset, only one bam file has the segfault, not the others.
And if I use featurecounts in built NCBI RefSeq annotation hg19, it works, although the chromosomes are in chr* format, instead of Ensembl format.
The command I used:
gtf <- "/srv/GT/reference/Homo_sapiens/Ensembl/GRCh37.p13/Annotation/Genes/genes.gtf"
bam <- "129278-pericyte_Mnb_sRNA_B_duprm.bam"
stranded=1
mh=TRUE
dup=FALSE
paired=FALSE
threads=8
foo1 <- Rsubread::featureCounts(files=bam,annot.ext=gtf,
isGTFAnnotationFile=TRUE,GTF.featureType="exon",
GTF.attrType="gene_id",nthreads=threads, isPairedEnd=paired,
strandSpecific=stranded,ignoreDup=dup,countMultiMappingReads=mh)
foo2 <- Rsubread::featureCounts(files=bam,annot.inbuilt="hg19",
nthreads=threads, isPairedEnd=paired,
strandSpecific=stranded,ignoreDup=dup,countMultiMappingReads=mh)
The bam file is available at https://1drv.ms/u/s!Ar4vJvgFbg4Ki9s2oq8Ava9ebXYyRQ if anyone wants to reproduce the error.
The seesion info:
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ezRun_1.2.1 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
[4] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0
[7] S4Vectors_0.16.0 BiocGenerics_0.24.0 data.table_1.10.4-3
[10] colorout_1.1-3 BiocInstaller_1.28.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 zlibbioc_1.24.0 RCurl_1.95-4.8
[4] compiler_3.4.2 GenomeInfoDbData_0.99.1
and the output of run:
> foo1 <- Rsubread::featureCounts(files=bam,annot.ext=gtf,
+ isGTFAnnotationFile=TRUE,GTF.featureType="exon",
+ GTF.attrType="gene_id",nthreads=threads, isPairedEnd=paired,
+ strandSpecific=stranded,ignoreDup=dup,countMultiMappingReads=mh)
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.28.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S 129278-pericyte_Mnb_sRNA_B_duprm.bam ||
|| ||
|| Dir for temp files : . ||
|| Threads : 8 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Strand specific : stranded ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file /srv/GT/reference/Homo_sapiens/Ensembl/GRCh37.p13 ... ||
|| Features : 1195764 ||
|| Meta-features : 57905 ||
|| Chromosomes/contigs : 61 ||
|| ||
|| Process BAM file 129278-pericyte_Mnb_sRNA_B_duprm.bam... ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
*** caught segfault ***
address 0x7fbd3c521734, cause 'memory not mapped'
Traceback:
1: Rsubread::featureCounts(files = bam, annot.ext = gtf, isGTFAnnotationFile = TRUE, GTF.featureType = "exon", GTF.attrType = "gene_id", nthreads = threads, isPairedEnd = paired, strandSpecific = stranded, ignoreDup = dup, countMultiMappingReads = mh)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Thanks, Ge
Please follow the posting guide (http://www.bioconductor.org/help/support/posting-guide/) when you post a question. Provide output of sessionInfo() so we can know what versions of software packages you use. You also need to show the errors you got from running featureCounts. It will be best to provide the complete screen output from your featureCounts run.
I changed as you suggested.
Thanks. We can reproduce the problem you encountered from our end. We found this was caused by a bug in the program that was related to the processing of bam files processed by Picard. We will release a fix soon.
This is now fixed in both release and devel versions.
Many thanks for fixing this.