Hi. I want to use QDNAseq to look for copy number changes in a Plasmodium falciparum sequence. I have made a BSgenome package from the sequence, which looks OK. It gave 2 Notes and 2 Warnings when I ran R CMD check, but it loads successfully:
> library("BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3") > pfg <- BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3 > seqinfo(pfg) Seqinfo object with 16 sequences (1 circular) from Pf3D7v3 genome: seqnames seqlengths isCircular genome chrom04 1200490 FALSE Pf3D7v3 chrom05 1343557 FALSE Pf3D7v3 ...
> head(pfg[['chrom01']]) 6-letter "DNAString" instance seq: TGAACC
But QDNAseq createBins function creates an empty structure:
> pfBins10k <- createBins(pfg, 10) Creating bins of 10 kbp for genome pfg > pfBins10k [1] chromosome start end bases gc <0 rows> (or 0-length row.names)
createBins() worked when I tested it with BSgenome.Celegans.UCSC.ce2, so I think the problem must be in the way I forged the package, but I can't find it. Any advice?
Thank you, Jocelyn
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS release 6.4 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Pfalciparum3D7.PlasmoDB.3D7v3_0.1-0 QDNAseq_1.4.1 [3] BSgenome_1.36.3 rtracklayer_1.28.6 [5] Biostrings_2.36.2 XVector_0.8.0 [7] GenomicRanges_1.20.5 GenomeInfoDb_1.4.1 [9] IRanges_2.2.5 S4Vectors_0.6.3 [11] BiocGenerics_0.14.0
Problem solved, sort-of: it worked when I changed to ignoreMitochondria=FALSE
Follow-up question: how is Mitochondrial status determined?
And I can see that I chose poor names for the chromosomes - I will change to more standardised. I thought I would use 'chrom' because 'chr' was confusable with 'character'. The original fasta file calls them "Pf3D7_04_v3", and "PFC10_API_IRAB", etc., and I thought the underscores might be the problem.