I am trying to implement the same analysis of offTargetAnalysis() but without a BSgenome object. This is being done by using offTargetAnalysisWithoutBSgenome(). But I get an error at the end of the analysis. Am I missing any parameter that's critical for the offTargetAnalysisWithoutBSgenome() run ?
Note : The error comes only when I use the offTargetAnalysisWithoutBSgenome(). It does not come when I am using offTargetAnalysis with BSgenome.Hsapiens.UCSC.hg19 and TxDb.Hsapiens.UCSC.hg19.knownGene. I tried using TxDb.Hsapiens.UCSC.hg19.knownGene instead of using a txdb object generated from hg19.refGene.gtf in offTargetAnalysisWithoutBSgenome() , but the error comes even in this case.
Here's the offTargetAnalysis call that I want to implement in offTargetAnalysisWithoutBSgenome :-
################################################################
#loading necessary pacakges
################################################################
library(Biostrings)
library(CRISPRseek)
library(GenomicFeatures)
################################################################
#loading the 1 guide rna sequence from a set which are present as 23 base pair sequences
################################################################
temp_gRNA = readDNAStringSet("/home/user/Documents/case2/gRNAs.fa", nrec = 1)
################################################################
#removing the PAM sequence from the end and keeping th rest of the 20 bp
################################################################
temp_gRNA = subseq(temp_gRNA, start = 1, width = 20)
################################################################
#offTargetAnalysis call
################################################################
res = offTargetAnalysis(inputFilePath = temp_gRNA, findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE,findPairedgRNAOnly = FALSE,chromToSearch = "all", outputDir = "/home/user/Documents/case4/", overwrite = TRUE, gRNAoutputName = "temp_guide_rna_scores" , enable.multicore = TRUE, n.cores.max = 5, BSgenomeName = BSgenome.Hsapiens.UCSC.hg19, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene)
Here's the offTargetAnalysisWithoutBSgenome implementation:-
################################################################
#loading necessary pacakges
################################################################
library(Biostrings)
library(CRISPRseek)
library(GenomicFeatures)
################################################################
#loading the 1 guide rna sequence from a set which are present as 23 base pair sequences
################################################################
temp_gRNA = readDNAStringSet("/home/user/Documents/case2/gRNAs.fa", nrec = 1)
################################################################
#removing the PAM sequence from the end and keeping th rest of the 20 bp
################################################################
temp_gRNA = subseq(temp_gRNA, start = 1, width = 20)
################################################################
#making txdb object
#gtf source : http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz
################################################################
human_txdb = makeTxDbFromGFF("/home/user/Documents/genome/hg19/hg19.refGene.gtf")
################################################################
#running off target analysis with hg19 genome fasta file
#genome source : http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
################################################################
res = offTargetAnalysisWithoutBSgenome(inputFilePath = temp_gRNA, findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE,findPairedgRNAOnly = FALSE,chromToSearch = "all", outputDir = "/home/user/Documents/crisprseek_sample_output_from_forcrisprseek_venv", overwrite = TRUE, gRNAoutputName = "temp_guide_rna_scores" , enable.multicore = TRUE, n.cores.max = 12,useBSgenome = FALSE, genomeSeqFile = "/home/user/Documents/genome/hg19/hg19.fa" , txdb = human_txdb)
I am not able to interpret the error which comes at the end : "Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : In range 12: 'end' must be >= 'start' - 1"
Here's the run output :
# Validating input ...
# >>> Finding all hits in sequence chr1 ...
.
.
.
# >>> Finding all hits in sequence chr18_gl000207_random ...
# >>> DONE searching
# Building feature vectors for scoring ...
# Calculating scores ...
# Annotating, filtering and generating reports ...
# Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :
# In range 12: 'end' must be >= 'start' - 1.
# In addition: There were 50 or more warnings (use warnings() to see the first 50)
Session info:
sessionInfo( )
─ Session info ─────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os CentOS Linux 8 (Core)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_IN.UTF-8
ctype en_IN.UTF-8
tz America/New_York
date 2020-11-22
─ Packages ─────────────────────────────────────────────────────────────────────────
package * version date lib source
ade4 1.7-16 2020-10-28 [1] CRAN (R 3.6.1)
AnnotationDbi * 1.48.0 2019-10-29 [1] Bioconductor
askpass 1.0 2019-01-02 [1] CRAN (R 3.6.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase * 2.46.0 2019-10-29 [1] Bioconductor
BiocFileCache 1.10.2 2019-11-08 [1] Bioconductor
BiocGenerics * 0.32.0 2019-10-29 [1] Bioconductor
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 3.6.1)
BiocParallel 1.20.1 2019-12-21 [1] Bioconductor
biomaRt 2.42.1 2020-03-26 [1] Bioconductor
Biostrings * 2.54.0 2019-10-29 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 3.6.1)
bit64 4.0.5 2020-08-30 [1] CRAN (R 3.6.1)
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.1)
blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.1)
BSgenome * 1.54.0 2019-10-29 [1] Bioconductor
BSgenome.Hsapiens.UCSC.hg19 * 1.4.0 2020-11-08 [1] Bioconductor
callr 3.5.1 2020-10-13 [1] CRAN (R 3.6.1)
cli 2.1.0 2020-10-12 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
CRISPRseek * 1.30.0 2020-10-27 [1] Bioconductor
curl 3.3 2019-01-10 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.1)
dbplyr 2.0.0 2020-11-03 [1] CRAN (R 3.6.1)
DelayedArray 0.12.3 2020-04-09 [1] Bioconductor
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
devtools 2.3.2 2020-09-18 [1] CRAN (R 3.6.1)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.6.0)
dplyr 1.0.2 2020-08-18 [1] CRAN (R 3.6.1)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.1)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.1)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb * 1.22.1 2020-03-27 [1] Bioconductor
GenomeInfoDbData 1.2.2 2020-11-07 [1] Bioconductor
GenomicAlignments 1.22.1 2019-11-12 [1] Bioconductor
GenomicFeatures * 1.38.2 2020-02-15 [1] Bioconductor
GenomicRanges * 1.38.0 2019-10-29 [1] Bioconductor
glue 1.4.2 2020-08-27 [1] CRAN (R 3.6.1)
hash 2.2.6.1 2019-03-04 [1] CRAN (R 3.6.1)
hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.1)
httr 1.4.2 2020-07-20 [1] CRAN (R 3.6.1)
IRanges * 2.20.2 2020-01-13 [1] Bioconductor
jsonlite 1.7.1 2020-09-07 [1] CRAN (R 3.6.1)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.3 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
matrixStats 0.57.0 2020-09-25 [1] CRAN (R 3.6.1)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
openssl 1.3 2019-03-22 [1] CRAN (R 3.6.0)
pillar 1.4.6 2020-07-10 [1] CRAN (R 3.6.1)
pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 3.6.1)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.1)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.4 2020-09-03 [1] CRAN (R 3.6.1)
progress 1.2.0 2018-06-14 [1] CRAN (R 3.6.0)
ps 1.4.0 2020-10-07 [1] CRAN (R 3.6.1)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.1)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
rappdirs 0.3.1 2016-03-28 [1] CRAN (R 3.6.1)
Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.6.0)
RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 3.6.1)
remotes 2.2.0 2020-07-21 [1] CRAN (R 3.6.1)
reticulate 1.18 2020-10-25 [1] CRAN (R 3.6.1)
rhdf5 2.30.1 2019-11-26 [1] Bioconductor
Rhdf5lib 1.8.0 2019-10-29 [1] Bioconductor
rlang 0.4.8 2020-10-08 [1] CRAN (R 3.6.1)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
Rsamtools 2.2.3 2020-02-23 [1] Bioconductor
RSQLite 2.2.1 2020-09-30 [1] CRAN (R 3.6.1)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.1)
rtracklayer * 1.46.0 2019-10-29 [1] Bioconductor
S4Vectors * 0.24.4 2020-04-09 [1] Bioconductor
seqinr 4.2-4 2020-10-10 [1] CRAN (R 3.6.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
SummarizedExperiment 1.16.1 2019-12-19 [1] Bioconductor
testthat 3.0.0 2020-10-31 [1] CRAN (R 3.6.1)
tibble 3.0.4 2020-10-12 [1] CRAN (R 3.6.1)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.1)
TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2 2020-11-08 [1] Bioconductor
usethis 1.6.3 2020-09-17 [1] CRAN (R 3.6.1)
vctrs 0.3.4 2020-08-29 [1] CRAN (R 3.6.1)
withr 2.3.0 2020-09-22 [1] CRAN (R 3.6.1)
XML 3.99-0.3 2020-01-20 [1] CRAN (R 3.6.1)
XVector * 0.26.0 2019-10-29 [1] Bioconductor
zlibbioc 1.32.0 2019-10-29 [1] Bioconductor
[1] /home/user/anaconda3/envs/for_crisprseek/lib/R/library
Hello Julie Zhu, Any pointers will be very helpful.
Hi Xenon,
Can you try to set annotateExon = FALSE to see if you still get the same error? Thanks! If you still encounter the same error, please post the input gRNA file "temp_gRNA" and hg19.fa. Thanks!
Best regards, Julie
I continue to see the same error with annotateExon = FALSE. I have used the genome, from the following link http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
And in my temp_gRNA , there's only 1 string of length 20 TTAAAGGTTTATACCTTCCC
I even ran the analysis without txdb = human_txdb. Still got the same error. I thought as we are not annotating the exons , txdb parameter might not be required.
Hi Durbar,
Thanks for performing more tests!
I ran the following command without encountering any error. I am wondering whether you have downloaded the most recent version of CRISPRseek at https://bioconductor.org/packages/release/bioc/html/CRISPRseek.html.
res = offTargetAnalysisWithoutBSgenome(inputFilePath = temp_gRNA, findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE,chromToSearch = "chr15", outputDir = “testDurbar”, overwrite = TRUE, gRNAoutputName = “DurbargRNA” , enable.multicore = FALSE, n.cores.max = 1,useBSgenome = FALSE, genomeSeqFile = "hg19.fa" , annotateExon = FALSE, txdb = human_txdb, max.mismatch = 1)
I also tried to run the analysis with BSgenome and found out that there is no perfect match in hg19 for the sequence you provided (TAAAGGTTTATACCTTCCC) which is consistent with the results using blat at https://genome.ucsc.edu/cgi-bin/hgBlat. Is this intended? Thanks!
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
res2 = offTargetAnalysis(inputFilePath = temp_gRNA, findgRNAs = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE,chromToSearch = "chr15", outputDir = "testDurbar-BSgenome", overwrite = TRUE, gRNAoutputName = "DurbargRNA" , enable.multicore = FALSE, n.cores.max = 1, BSgenomeName = BSgenome.Hsapiens.UCSC.hg19, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 1)
$offtarget name gRNAPlusPAM OffTargetSequence inExon inIntron entrez_id score n.mismatch mismatch.distance2PAM 1 fromDurbar TTAAAGGTTTATACCTTCCCNGG TTAAAGGTTTATACCTTCACCAG TRUE 83660 31.5 1 2 alignment isCanonicalPAM forViewInUCSC strand chrom chromStart chromEnd extendedSequence gRNAefficacy 1 ..................A. 0 chr15:62941533-62941555 - chr15 62941533 62941555 ATCTTTAAAGGTTTATACCTTCACCAGCAC 0.1825474
BTW, to make your test run faster, you can specify max.mismatch = 1, and chromToSearch = "chr15" since there is only one match found in chr15 with at most 1 mismatch.
Best regards,
Julie
Hi Durbar,
I did more testing and discovered that there is indeed a bug when using non-BSgenome sequence files for fetching sequence. The error only occurs if more than one chromosome is searched. I fixed the bug and you can obtain the updated package with git clone https://github.com/LihuaJulieZhu/CRISPRseek.git.
I need to update the Bioconductor release and dev version yet.
Thanks for finding and reporting the bug!
Best regards,
Julie
This is working ! :)