Dear all,
I have bed files with ChIP-seq data, with sequence (chromosome) names like this:
chr1, chr10, chr11, chr12, chr13, chr13_random, chr14, . . ., chrX
I import the bed file data into a DiffBind DBA object using the command
> dbaObj <- dba(sampleSheet = "samples.csv")
The problem is that the chromosome names in the DBA object apparently get stripped of their characters, so "chr1" becomes "1" and the chromosomes "X" and "Y" are missing altogether. More specifically, in
> dbaObj$chrmap
[1] "chr1" "chr10" "chr11" "chr12" "chr13"
[6] "chr14" "chr15" "chr16" "chr17" "chr18"
[11] "chr19" "chr2" "chr3" "chr4" "chr5"
[16] "chr6" "chr7" "chr8" "chr9" "chrUn_random"
[21] "chrX" "chrY"
the chromosome names are intact, while in
> head(dbaObj$allvectors)
CHR START END sample1 sample2 sample3 sample4
1 1 3435940 3436366 -1.00000000 0.01958029 0.04800027 0.07549102
2 1 3441807 3442009 -1.00000000 -1.00000000 0.01884463 -1.00000000
3 1 4408039 4408343 -1.00000000 -1.00000000 0.02658240 -1.00000000
4 1 4486341 4486691 -1.00000000 0.03415241 -1.00000000 -1.00000000
5 1 4561545 4562084 0.02217287 0.07207616 -1.00000000 0.02224536
6 1 4592472 4592914 0.03316482 0.06758506 -1.00000000 0.03507404
the chromosome names have changed, and chromosome X is missing.
Maybe I miss something here? I would be happy if someone could give me a hint how to fix this.
Best wishes,
Georg
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] DiffBind_1.12.0 limma_3.22.1 ChIPpeakAnno_2.16.2
[4] AnnotationDbi_1.28.1 Biobase_2.26.0 RSQLite_1.0.0
[7] DBI_0.3.1 biomaRt_2.22.0 VennDiagram_1.6.9
[10] chipseq_1.16.0 ShortRead_1.24.0 GenomicAlignments_1.2.1
[13] Rsamtools_1.18.2 BiocParallel_1.0.0 BSgenome_1.34.0
[16] rtracklayer_1.26.2 Biostrings_2.34.0 XVector_0.6.0
[19] GenomicRanges_1.18.3 GenomeInfoDb_1.2.3 IRanges_2.0.0
[22] S4Vectors_0.4.0 BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] amap_0.8-12 base64enc_0.1-2 BatchJobs_1.5
[4] BBmisc_1.8 bitops_1.0-6 brew_1.0-6
[7] caTools_1.17.1 checkmate_1.5.0 codetools_0.2-9
[10] compiler_3.1.1 digest_0.6.4 edgeR_3.8.3
[13] fail_1.2 foreach_1.4.2 gdata_2.13.3
[16] GenomicFeatures_1.18.2 GO.db_3.0.0 gplots_2.14.2
[19] gtools_3.4.1 hwriter_1.3.2 iterators_1.0.7
[22] KernSmooth_2.23-13 lattice_0.20-29 latticeExtra_0.6-26
[25] MASS_7.3-35 multtest_2.22.0 RColorBrewer_1.0-5
[28] RCurl_1.95-4.3 sendmailR_1.2-1 splines_3.1.1
[31] stringr_0.6.2 survival_2.37-7 tools_3.1.1
[34] XML_3.98-1.1 zlibbioc_1.12.0