Hi,
I am trying to locate GGCGTGGTGGTGTGTGCCTGTAG at the minus strand of chr16 of hg19 using matchPattern function. However, when I try to view one of the hits chr16:24167733-24167755 via UCSC genome browser hg19 , the reverse complement of the genomic sequence at chr16:24167733-24167755 is not GGCGTGGTGGTGTGTGCCTGTAG. Blat search also fails to return chr16:24167733-24167755
Any ideas on why I am seeing the discrepancy? FYI, I tried both unmasked and masked hg19 Bsgenome.
Here is the code snippet.
library(BSgenome.Hsapiens.UCSC.hg19)
m1 <- matchPattern("GGCGTGGTGGTGTGTGCCTGTAG", reverseComplement(Hsapiens$chr16), max.mismatch=0)
length(Hsapiens$chr16) - end(m1)[7] + 1
#[1] 24167733
length(Hsapiens$chr16) - start(m1)[7] + 1
#[1] 24167755
Many thanks!
Best regards,
Julie
sessionInfo()
R version 3.2.2 Patched (2015-09-13 r69389)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
Running under: OS X 10.8.5 (Mountain Lion)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19.masked_1.3.99
[2] BSgenome.Hsapiens.UCSC.hg19_1.4.0
[3] BSgenome_1.36.3
[4] rtracklayer_1.28.10
[5] Biostrings_2.36.4
[6] XVector_0.8.0
[7] GenomicRanges_1.20.8
[8] GenomeInfoDb_1.4.3
[9] IRanges_2.2.9
[10] S4Vectors_0.6.6
[11] BiocGenerics_0.14.0
loaded via a namespace (and not attached):
[1] XML_3.98-1.3 Rsamtools_1.20.5
[3] bitops_1.0-6 GenomicAlignments_1.4.2
[5] futile.options_1.0.0 zlibbioc_1.14.0
[7] futile.logger_1.4.1 lambda.r_1.1.7
[9] BiocParallel_1.2.22 tools_3.2.2
[11] RCurl_1.95-4.7
Hi Julie,
I'm not sure why blat doesn't return it. My best guess is that blat is restricted to returning the first 135 hits (I get 135 for both the forward and reverse sequences), and blat starts on chr1, so it returns the first 135 hits and then stops. I don't see anything about a limit on how many hits it returns, so I could be wrong, but that's my best guess.