Hi Jonathan,
I have not encountered this problem previously and do not have an
immediate
solution.
You state that 16 out of 20 bam files can be processed without a
problem by
applying
mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam", BSgenome =
"Hsapiens", sample_name = "0139202").
This surprises me a little bit, because you have to state the whole
BSgenome name, for example BSgenome ="BSgenome.Hsapiens.UCSC.hg19".
[It was
pointed out few days ago that I can make the package much more
flexible by
allowing other data types- Hervé, thank you for the hint, we will
definitely consider this]. However, this does not seem to be your
problem
as you encounter problems for only 4 out of 20 bam files.
Therefore, I assume that there is something strange with these
specific bam
files. Do they contain non-mapped reads? MEDIPS 1.10.0 reports an
error, if
there are none mapped reads in the bam files, but the latest version
of
MEDIPS available in the development branch can deal with this.
However,
based on the example read you've sent this might not cause the error.
The
last regular status report you get is 'Creating GRange Object...'.
This
happens in the getGRange() function called by MEDIPS.createSet() after
the
bam files has been imported. The command that probably causes the
error is
regions_GRange = GRanges(seqnames=regions$chr,
ranges=IRanges(start=regions$start, end=regions$stop),
strand=regions$strand)
but I do not understand why you have missing values. In order to
understand
this, you might want to try to import your bam file by yourself using
library(Rsamtools)
scanFlag = scanBamFlag(isUnmappedQuery = F) #this will ignore unmapped
reads in the bam file and is implemented in the latest dev version
(1.11.16).
scanParam=ScanBamParam(flag=scanFlag, what = c("rname", "pos",
"strand",
"qwidth"))
regions = scanBam(file="0139202.fq.sam.noDUP.bam.qf.bam",
param=scanParam)
regions = do.call(rbind,lapply(regions, as.data.frame,
stringsAsFactors=F))
regions = data.frame(chr=as.character(as.vector(regions$rname)),
start=as.numeric(as.vector(regions$pos)),
stop=as.numeric(as.vector(regions$pos)+as.vector(regions$qwidth)-1),
strand=as.character(as.vector(regions$strand)), stringsAsFactors=F)
Afterwards, it will be necessary to investigate where your regions
object
has NA values.
Another thing: do you have bam index files in the same directory where
you
store the bam file? MEDIPS will make use of these, if they are
available
(using slightly different code as stated above). In case there is a
discrepancy between the index and bam files (having the same prefix
file
name), this might cause a problem as well.
Finally, you can convert your bam files into bed files (by yourself)
and
try uploading simple txt based bed files into MEDIPS.
I hope we will identify the root of your error.
Lukas
On Thu, Aug 15, 2013 at 9:28 PM, Jonathan Ellis
<jonathan.ellis@qimr.edu.au>wrote:
> Dear Lukas and list,
>
> I'm trying to process a set of BAM files using the latest version of
> MEDIPS (1.10.0), but have run into problems creating MEDIPSset
objects
> for some BAM file. The following is an example, but I'm getting the
> same error for 4 out of 20 BAM files.
>
> > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam",
> + BSgenome = "Hsapiens", sample_name = "0139202")
> Reading bam alignment 0139202.fq.sam.noDUP.bam.qf.bam
> Total number of imported short reads: 13813686
> Creating GRange Object...
> Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE =
"IRanges")
> :
> solving row 11345799: range cannot be determined from the supplied
> arguments
> (too many NAs)
>
> The data are single-end reads that were aligned with bwa version
> 0.5.9-r16 (the alignments were done some time ago hence the older
> version of bwa), and the corresponding line from the BAM
(11,345,799)
> is:
>
> 19441177 16 chr17 38161416 18 49M *
0
> 0 TAGAGTCCGGCGTTCAGGGGCAGGAAGCATCCAGCACGGGAGAAAGATG
> BBBBBBBBBBBBBBBBBd_IIX[ffgb[[YJb^d^[[ggee^^cc^^^_ X0:i:1
X1:i:3
>
XA:Z:chr1,+56337006,49M,3;chr12,-55896993,49M,3;chr4,-22778322,49M,3;
> MD:Z:8A7A32 XG:i:0 NM:i:2 XM:i:2 XO:i:0XT:A:U
>
> I'm unsure whether this is a problem with the MEDIPS package or
> something from GRanges/IRanges. As far as I understand the .Call2
> function is part of IRanges, but I assume it's failing due to
something
> passed by MEDIPS. Any advice or pointers would be greatly
appreciated.
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 MEDIPS_1.10.0
> [3] DNAcopy_1.34.0 BSgenome_1.28.0
> [5] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.4
> [7] Biostrings_2.28.0 GenomicFeatures_1.12.3
> [9] AnnotationDbi_1.22.6 Biobase_2.20.1
> [11] GenomicRanges_1.12.4 IRanges_1.18.2
> [13] BiocGenerics_0.6.0 BiocInstaller_1.10.3
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.16.0 bitops_1.0-5 DBI_0.2-7 gtools_3.0.0
> [5] RCurl_1.95-4.1 Rsamtools_1.12.3 RSQLite_0.11.4 stats4_3.0.1
> [9] tools_3.0.1 XML_3.98-1.1 zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]