Question

running off target analaysis

0

Entering edit mode

Yap-JM • 0

@yap-jm-12400

Last seen 8.2 years ago

I dont understand the following instructions in crisprseek attachment 8. I tried to follow when hit rock bottom.

outputDir <- getwd()

does that mean i type in (i)outputDir or (ii)getwd() or (iii)outputDir <- getwd() ? (i) when i type in outputdir, it givees me [1] "C:/Users/Jit Ming/Documents/doc 1

does that mean i have set the output directory?

inputFilePath <- "X.fa"

does that mean I type in "inputFilePath"?

> "inputFilePath"
[1] "inputFilePath"

or

> "MS1.fasta"
[1] "MS1.fasta"

REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek")

I dont understand how REpatternFile is done too? can someone please help me with REpatternFIle and offtargetAnalysis? I do not knwo hwo you bring up inputfilepath, findgRNAswithREcutonly command ........

offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,

REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 3, chromToSearch = "", outputDir = outputDir, overwrite = TRUE)

Sorry I am big time computer-illiterate.

crisprseek • 4.9k views

ADD COMMENT • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

> library(CRISPRseek)
> library("BSgenome.Hsapiens.UCSC.hg19")
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> outputDir <- getwd()
> inputFilePath <- "MS1.fasta "
> REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek")
> offTargetAnalysis (inputFilePath, format=“fasta”, findgRNAs=TRUE, exportAllgRNAs=c(“all”, “fasta”, “genbank”, “no”), findgRNAsWithREcutOnly=FALSE, REpatternFile, minREpatternSize=6, overlap.gRNA.positions=c(17, 18), findPairedgRNAOnly=FALSE, min.gap=0, max.gap=20, gRNA.name.prefix=“gRNA”, PAM.size=3, gRNA.size=20, PAM=“NGG”, BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch=“all”, max.mismatch=4, PAM.pattern=“N[A|G]G$”, gRNA.pattern=“”, min.score=0.5, topN=100, topN.OfftargetTotalScore=10, annotateExon=TRUE, txdb, outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=FALSE)
Error: unexpected input in "offTargetAnalysis (inputFilePath, format=“"

I think the instructions meant that i have to copy and paste those lines into R programming. but i have an error in format. why is that? I use CLuter2X to convert MS1.txt to MS1.fasta file. what is wrong with this format?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

I see this is your first post. Welcome to Bioconductor!

A few tips. Please post your question as a "Question" rather than as a "Tutorial" -- you're not offering a tutorial! Please tag your question with the name of package you want help with -- I've done that for you now. Please don't post answers to your own question -- use the "Add Comment" or "Add Reply" buttons instead. I've moved your answer to a comment, leaving the answer space free for real answers.

ADD REPLY • link 8.2 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon,

Thank you for those tips.

I saw some previous posts and noticed people were using .fa rather than .fasta file. So I changed my MS1.fasta to MS1.fa. It seems that it has gone away. But I have new problems.

> library(CRISPRseek)
> library("BSgenome.Hsapiens.UCSC.hg19")
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> outputDir <- getwd()
> inputFilePath <- "MS1.fa"
> REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek")
> offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb, outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=TRUE)
Validating input ...
Error in substr(outputDir, nchar(outputDir), nchar(outputDir)) :
argument "outputDir" is missing, with no default

Question: I use 'file' on top toolbar, 'change directory' function to set my outputdir. But I dont understand why my outputdir is missing.

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

btw I just change the name MS1.fa from MS1.fasta. Will that be OK? is MS1.fa the same as MS1.fasta (converted from MS1.txt via cluster2X)?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

YJM, The file name should not matter. Most importantly, the file should be in fasta format as specified at https://en.m.wikipedia.org/wiki/FASTA_format Here is an example with 2 sequences >seq1 ACGTAAAACGTGGTTTTTAA >seq2 TTTTCCGAACGTAAAACGTGGTACGTAAAACGTGGT Best regards, Julie On Feb 18, 2017, at 7:54 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92750="">: btw I just change the name MS1.fa from MS1.fasta. Will that be OK? is MS1.fa the same as MS1.fasta (converted from MS1.txt via cluster2X)? ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

YJM, Please try to replace your offTargetAnalysis function call with the following code. Please note that I have set the outputDir and txdb parameters. For additional help on parameter setting using offTargetAnalysis, please type help(offTargetAnalysis) in a R session. offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=TRUE) Best regards, Julie On Feb 18, 2017, at 7:50 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92749="">: Hi Gordon, Thank you for those tips. I saw some previous posts and noticed people were using .fa rather than .fasta file. So I changed my MS1.fasta to MS1.fa. It seems that it has gone away. But I have new problems. > library(CRISPRseek) > library("BSgenome.Hsapiens.UCSC.hg19") > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > outputDir <- getwd() > inputFilePath <- "MS1.fa" > REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek") > offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb, outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=TRUE) Validating input ... Error in substr(outputDir, nchar(outputDir), nchar(outputDir)) : argument "outputDir" is missing, with no default Question: I use 'file' on top toolbar, 'change directory' function to set my outputdir. But I dont understand why my outputdir is missing. ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

score 0 · Answer 1 · 2017-02-19

0

Entering edit mode

Yap-JM • 0

@yap-jm-12400

Last seen 8.2 years ago

Hi Julie,

I entered the function you gave me but encountered the below error.

> outputDir <- getwd()
> inputFilePath <- "MS1.fa"
> REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek")

> offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=TRUE)
Error in offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, :
formal argument "txdb" matched by multiple actual arguments

Please advise.

ADD COMMENT • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

Hi Julie,

I looked at the page help(offTargetAnalysis) you suggested and came out with the settings below. I intend to run 3 different settings.

My setting 1

>offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = FALSE, enable.multicore = FALSE, n.cores.max = 6, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "NGG", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=FALSE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = FALSE, foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0,AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), PAM.location = "3prime", mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek"))

My setting 2. same as setting 1 except gRNA.size = 17, PAM.pattern = "N[A|G]G$",subPAM.position = c(19, 20)

My setting 3 same as setting 1 except PAM.pattern = "N[A|G]G$"

My questions are below:

Q1. There are no mention of GCTA%. WHat is the default for them 1<G/C/T/A<100% or 1<G/C/T/A<80%?

Q2. Can I set 40<G<80%, 1<T<40%, 1<A<40% and 1<C<80%. Most journals suggested 40<G<80% for effective sgRNA design.

I plan to use setting 1 (1<G/C/T/A<80%), setting 2 and 3 (40<G<80%, 1<T<40%, 1<A<40% and 1<C<80%)

Q3. How do I install GRFold package? do I just enter >biocLite(GRFold)?

Q4.Is it correct to use SubPAMposition= c(19, 20) in setting 2?

Sorry for asking so many questions as I am not familiar with this platform. Your input is hghly valued.

I tried to run setting 1 but got the error shown below in bold.
Error in offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, :
formal argument "txdb" matched by multiple actual arguments

Many thanks

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

btw my MS1.fasta file has 241 nucleotides i need to test.

>MS1

CAGGTGCAGCAGCTCATCAGCAACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGGTTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCA

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

YJM, Here is the user guide that might be helpful to you. http://www.bioconductor.org/packages/release/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf ######### Setting 1 ######### library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") outputDir <- "MS1crisprseekOutput-guideSize20" inputSeqs <- DNAStringSet("CAGGTGCAGCAGCTCATCAGCAACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGGTTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCA"); names(inputSeqs) <- "MS1"; inputSeqs results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) ######### Setting 2 ######### results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) ######### Setting 3 ######### Same as setting 1 since by default PAM.pattern to search for off targets is set to "N[A|G]G$". If you would like to search for gRNA with PAM NGG or NAG, then please set PAM = "N[A|G]G" Q1 & Q2 Regarding GCTA%, there is no restriction on searching gRNAs. However, GC% and GCTA composition has been incorporated into the gRNA efficacy calculation. You can find gRNA efficacy value in the Summary.xls file in the outputDir. Q3 Unfortunately, GeneRFold package has been deprecated. However, you do not need to have it for efficacy calculation or off target search. If you would like to have the gRNA secondary structure predicted, you can get the old version at http://bioconductor.case.edu/bioconductor/2.8/bioc/html/GeneRfold.html Q4 Yes, you can. Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+a8b09b64+code@bioconductor.org<mailto:reply+a8b09b64+code@bioconductor.org>" <reply+a8b09b64+code@bioconductor.org<mailto:reply+a8b09b64+code@bioconductor.org>> Date: Sunday, February 19, 2017 6:07 AM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92757="">: Hi Julie, I looked at the page help(offTargetAnalysis) you suggested and came out with the settings below. I intend to run 3 different settings. My setting 1 >offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = FALSE, enable.multicore = FALSE, n.cores.max = 6, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "NGG", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=FALSE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = FALSE, foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0,AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), PAM.location = "3prime", mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek")) My setting 2. same as setting 1 except gRNA.size = 17, PAM.pattern = "N[A|G]G$",subPAM.position = c(19, 20) My setting 3 same as setting 1 except PAM.pattern = "N[A|G]G$" My questions are below: Q1. There are no mention of GCTA%. WHat is the default for them 1<g for="" to="">biocLite(GRFold)? Q4.Is it correct to use SubPAMposition= c(19, 20) in setting 2? Sorry for asking so many questions as I am not familiar with this platform. Your input is hghly valued. I tried to run setting 1 but got the error shown below in bold. Error in offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, : formal argument "txdb" matched by multiple actual arguments Many thanks ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis </g>

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

BTW, to prevent the previous results being overwritten, you need to set outputDir to a different directory for setting2 and setting3. Setting 2 outputDir <- "MS1crisprseekOutput-guideSize17" Setting 3 outputDir <- "MS1crisprseekOutput-guideSize20-NAGorNGG" Best, Julie From: "Julie Zhu [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+36329ef8+code@bioconductor.org<mailto:reply+36329ef8+code@bioconductor.org>" <reply+36329ef8+code@bioconductor.org<mailto:reply+36329ef8+code@bioconductor.org>> Date: Sunday, February 19, 2017 9:43 AM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Julie Zhu<https: support.bioconductor.org="" u="" 3596=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92767="">: YJM, Here is the user guide that might be helpful to you. http://www.bioconductor.org/packages/release/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf ######### Setting 1 ######### library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") outputDir <- "MS1crisprseekOutput-guideSize20" inputSeqs <- DNAStringSet("CAGGTGCAGCAGCTCATCAGCAACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGGTTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCA"); names(inputSeqs) <- "MS1"; inputSeqs results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) ######### Setting 2 ######### results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) ######### Setting 3 ######### Same as setting 1 since by default PAM.pattern to search for off targets is set to "N[A|G]G$". If you would like to search for gRNA with PAM NGG or NAG, then please set PAM = "N[A|G]G" Q1 & Q2 Regarding GCTA%, there is no restriction on searching gRNAs. However, GC% and GCTA composition has been incorporated into the gRNA efficacy calculation. You can find gRNA efficacy value in the Summary.xls file in the outputDir. Q3 Unfortunately, GeneRFold package has been deprecated. However, you do not need to have it for efficacy calculation or off target search. If you would like to have the gRNA secondary structure predicted, you can get the old version at http://bioconductor.case.edu/bioconductor/2.8/bioc/html/GeneRfold.html Q4 Yes, you can. Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org><mailto:noreply@bioconductor.org>> Reply-To: "reply+a8b09b64+code@bioconductor.org<mailto:reply+a8b09b64+code@bioconductor.org><mailto:reply+a8b09b64+code@bioconductor.org>" <reply+a8b09b64+code@bioconductor.org<mailto:reply+a8b09b64+code@bioconductor.org><mailto:reply+a8b09b64+code@bioconductor.org>> Date: Sunday, February 19, 2017 6:07 AM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu><mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92757="">: Hi Julie, I looked at the page help(offTargetAnalysis) you suggested and came out with the settings below. I intend to run 3 different settings. My setting 1 >offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = FALSE, enable.multicore = FALSE, n.cores.max = 6, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "NGG", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=FALSE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = FALSE, foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0,AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), PAM.location = "3prime", mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek")) My setting 2. same as setting 1 except gRNA.size = 17, PAM.pattern = "N[A|G]G$",subPAM.position = c(19, 20) My setting 3 same as setting 1 except PAM.pattern = "N[A|G]G$" My questions are below: Q1. There are no mention of GCTA%. WHat is the default for them 1<g for="" to="">biocLite(GRFold)? Q4.Is it correct to use SubPAMposition= c(19, 20) in setting 2? Sorry for asking so many questions as I am not familiar with this platform. Your input is hghly valued. I tried to run setting 1 but got the error shown below in bold. Error in offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, : formal argument "txdb" matched by multiple actual arguments Many thanks ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92757=""> </g> ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

YJM, Please remove the extra txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, which appeared twice in the code. Best, Julie On Feb 19, 2017, at 4:54 AM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Answer: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92756="">: Hi Julie, I entered the function you gave me but encountered the below error. > outputDir <- getwd() > inputFilePath <- "MS1.fa" > REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek") > offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb =TxDb.Hsapiens.UCSC.hg19.knownGene, outputDir = outputDir, fetchSequence=TRUE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite=TRUE) Error in offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, : formal argument "txdb" matched by multiple actual arguments Please advise. ________________________________ Post tags: crisprseek You may reply via email or visit A: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I removed the txdb line and encountered another error.

> offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = FALSE, enable.multicore = FALSE, n.cores.max = 6, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "NGG", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, outputDir = outputDir, fetchSequence=FALSE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = FALSE, foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0,AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), PAM.location = "3prime", mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek"))
Validating input ...
Searching for gRNAs ...
Error in findgRNAs(inputFilePath, findPairedgRNAOnly = findPairedgRNAOnly, :
inputfile specified as MS1.fa does not exists!
In addition: Warning message:
In dir.create(outputDir) : 'C:\Users\Jit Ming\Documents' already exists

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

YJM, I tested the following code and it should work. library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") outputDir <- "MS1crisprseekOutput-guideSize20" inputSeqs <- DNAStringSet("CAGGTGCAGCAGCTCATCAGCAACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGGTTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCA"); names(inputSeqs) <- "MS1"; inputSeqs results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+80200852+code@bioconductor.org<mailto:reply+80200852+code@bioconductor.org>" <reply+80200852+code@bioconductor.org<mailto:reply+80200852+code@bioconductor.org>> Date: Sunday, February 19, 2017 8:05 AM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92762="">: Hi Julie, I removed the txdb line and encountered another error. > offTargetAnalysis(inputFilePath, format = "fasta", findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"),findgRNAsWithREcutOnly = FALSE, REpatternFile, minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = FALSE, enable.multicore = FALSE, n.cores.max = 6, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, chromToSearch = "all", max.mismatch = 4, PAM.pattern = "NGG", allowed.mismatch.PAM = 1, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, outputDir = outputDir, fetchSequence=FALSE, upstream=200, downstream=200, weights=c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = FALSE, foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0,AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), PAM.location = "3prime", mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek")) Validating input ... Searching for gRNAs ... Error in findgRNAs(inputFilePath, findPairedgRNAOnly = findPairedgRNAOnly, : inputfile specified as MS1.fa does not exists! In addition: Warning message: In dir.create(outputDir) : 'C:\Users\Jit Ming\Documents' already exists ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

When I tried to run your recommendation.

it said

> library(org.Hs.eg.db)
Error in library(org.Hs.eg.db) :
there is no package called ‘org.Hs.eg.db’

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

also if you look at your recommendation

library("org.Hs.eg.db")

your orgAnn = org.Hs.egSYMBOL

They are different. will they be ok?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

I skipped library("org.Hs.eg.db") and it ran but it has errors at chr2. is it because of my low computer memory that causes it to stop?

> results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "MS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)
Validating input ...
Searching for gRNAs ...
>>> Finding all hits in sequence chr1 ...
>>> DONE searching
>>> Finding all hits in sequence chr2 ...
Error in .local(con, format, text, ...) : UCSC library operation failed
In addition: Warning message:
In .local(con, format, text, ...) :
needLargeMem: Out of memory - request size 243199374 bytes, errno: 12

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

setting 1. I noticed you did not add PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG". I wonder if those are default settings. Since you did not add them, I added them but when I ran them, i have an error.

>results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)

> outputDir <- "NS1crisprseekOutput-guideSize20-NGG"
> inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA")
> names(inputSeqs) <- "NS1"
> results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)
Validating input ...
Searching for gRNAs ...
>>> Finding all hits in sequence chr1 ...
Error: cannot allocate vector of size 8.0 Mb
Timing stopped at: 5.35 0.17 5.62

is this error due to low computer memory as well?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

Yes, those are default settings. Yes, the error is about your R memory limit. Please take a look at https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html. If it does not address the issue, please search the old posts in the support site or google about R memory management. BTW, please check your output directory and you should have the gRNA output files already. Best, Julie On Feb 19, 2017, at 1:46 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92777="">: setting 1. I noticed you did not add PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG". I wonder if those are default settings. Since you did not add them, I added them but when I ran them, i have an error. >results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) > outputDir <- "NS1crisprseekOutput-guideSize20-NGG" > inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") > names(inputSeqs) <- "NS1" > results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Validating input ... Searching for gRNAs ... >>> Finding all hits in sequence chr1 ... Error: cannot allocate vector of size 8.0 Mb Timing stopped at: 5.35 0.17 5.62 is this error due to low computer memory as well? ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Yes. It is correct. org.Hs. egSYMBOL is a object in the org.Hs.eg.db for mapping Entrez id to gene symbol. Best, Julie On Feb 19, 2017, at 1:18 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92775="">: also if you look at your recommendation library("org.Hs.eg.db") your orgAnn = org.Hs.egSYMBOL They are different. will they be ok? ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Need to install and quote the package source("http://bioconductor.org/biocLite.R") biocLite("org.Hs.eg.db") library("org.Hs.eg.db") Best, Julie On Feb 19, 2017, at 1:15 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92774="">: Hi Julie, When I tried to run your recommendation. it said > library(org.Hs.eg.db) Error in library(org.Hs.eg.db) : there is no package called �org.Hs.eg.db� ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I used a friend's newer computer and ran your recommendation. I skipped library("org.Hs.eg.db"). yet it ran well. After checking all chromosomes, i encoutnered two error. error 1 is possible due to org.Hs.eg.db as I did not install it. However, error two is 'NS1crisprseekOutput-guideSize20-NGG' already exists. see below. What should I do to bypass 'NS1crisprseekOutput-guideSize20-NGG' error in your recommendation?

>>> Finding all hits in sequence chrUn_gl000248 ...

>>> DONE searching

>>> Finding all hits in sequence chrUn_gl000249 ...

>>> DONE searching

Building feature vectors for scoring ...

Calculating scores ...

Annotating, filtering and generating reports ...

Error in annotateOffTargets(Offtargets, txdb, orgAnn) :

object 'org.Hs.egSYMBOL' not found

In addition: Warning message:

In dir.create(outputDir) :

'NS1crisprseekOutput-guideSize20-NGG' already exists

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

YJM, You can safely ignore the warning message. Best, Julie On Feb 19, 2017, at 7:01 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92787="">: Hi Julie, I used a friend's newer computer and ran your recommendation. I skipped library("org.Hs.eg.db"). yet it ran well. After checking all chromosomes, i encoutnered two error. error 1 is possible due to org.Hs.eg.db as I did not install it. However, error two is 'NS1crisprseekOutput-guideSize20-NGG' already exists. see below. What should I do to bypass 'NS1crisprseekOutput-guideSize20-NGG' error in your recommendation? >>> Finding all hits in sequence chrUn_gl000248 ... >>> DONE searching >>> Finding all hits in sequence chrUn_gl000249 ... >>> DONE searching Building feature vectors for scoring ... Calculating scores ... Annotating, filtering and generating reports ... Error in annotateOffTargets(Offtargets, txdb, orgAnn) : object 'org.Hs.egSYMBOL' not found In addition: Warning message: In dir.create(outputDir) : 'NS1crisprseekOutput-guideSize20-NGG' already exists ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

Uploaded libray(org.Hs.eg.db) into R.

Now re-run the program.

When I looked at the sample output for off targets in scenario 8 CRISPRseek user's guide. I noticed they are all rsXXXXXXX. These are all gene off targets. I wonder if I want to check for intron off targets, how do I do that? CRISPRseek does scan for exon and intron off targets right?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

YJM, The code you ran should output the Gene and inExon information. Yes. Selected gRNAs from scenario 8 can then be examined for off-target sequences as described in scenario 5, using similar code I sent to you previously. The offtarget output file should have a column called inExon with default setting (annotateExon = TRUE). If the offTarget is inside a gene and InExon is annotated as False, then the offTarget is in intron. Please look at the documentation at https://www.rdocumentation.org/packages/CRISPRseek/versions/1.12.0/topics/offTargetAnalysis Best regards, Julie Sent from my iPhone On Feb 20, 2017, at 1:44 AM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92790="">: Hi Julie, Uploaded libray(org.Hs.eg.db) into R. Now re-run the program. When I looked at the sample output for off targets in scenario 8 CRISPRseek user's guide. I noticed they are all rsXXXXXXX. These are all gene off targets. I wonder if I want to check for intron off targets, how do I do that? CRISPRseek does scan for exon and intron off targets right? ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

Question 1.

When I ran NS5 file. I have an error. It is highlighted in bold. How do I re-run and bypass this error?

NS5 error

> library("CRISPRseek")

> library("BSgenome.Hsapiens.UCSC.hg19")

> library("TxDb.Hsapiens.UCSC.hg19.knownGene")

> library(org.Hs.eg.db)

> outputDir <- "NS5crisprseekOutput-guideSize20-NGG"

> inputSeqs <- DNAStringSet("ACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGATTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCAAAACCCGAAAAATCAAGACAGT")

> names(inputSeqs) <- "NS5"

> results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS5", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)

Validating input ...

Searching for gRNAs ...

Done annotating

Add paired information...

Add RE information...

write gRNAs to bed file...

Scan for REsites in flanking region...

Error in .normargStrand(strand, max0123) :

strand values must be in '+' '-' '*'

In addition: Warning message:

In dir.create(outputDir) :

'NS5crisprseekOutput-guideSize20-NGG' already exists

Question 2. When I ran NS1 in setting 1, I encountered another error highlighted in bold. How do I re-run and bypass this error?

NS1 error

> library("CRISPRseek")

> library("BSgenome.Hsapiens.UCSC.hg19")

> library("TxDb.Hsapiens.UCSC.hg19.knownGene")

> library(org.Hs.eg.db)

> outputDir <- "NS1crisprseekOutput-guideSize17-NAGorNGG"

> inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA")

> names(inputSeqs) <- "NS1"

> results.guideSize17 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 17, PAM = "NGG", max.mismatch = 2, PAM.pattern = “N[A|G]G$”, BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)

Error: unexpected input in "ortAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size "

Thanks.

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

Since my second setting on NS1 gave me problems, I moved onto the next setting. mynext setting also gave me an error highlighted in bold. How can I re-run to bypass this error?

> library("CRISPRseek")

> library("BSgenome.Hsapiens.UCSC.hg19")

> library("TxDb.Hsapiens.UCSC.hg19.knownGene")

> library(org.Hs.eg.db)

> outputDir <- "NS1crisprseekOutput-guideSize20-NAGorNGG"

> inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA")

> names(inputSeqs) <- "NS1"

> results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 3, PAM.pattern = “N[A|G]G$”, outputDir = outputDir, overwrite = TRUE)

Error: unexpected input in "tAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.kno"

>

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

The quotation around PAM.pattern is not right. Wrong quotation: PAM.pattern = “N[A|G]G$”. Right quotation: PAM.pattern ="N[A|G]G$" Since you are using the default setting, you do not have to include this in your code. Best regards, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+73eaccf1+code@bioconductor.org<mailto:reply+73eaccf1+code@bioconductor.org>" <reply+73eaccf1+code@bioconductor.org<mailto:reply+73eaccf1+code@bioconductor.org>> Date: Monday, February 20, 2017 12:29 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92811="">: Since my second setting on NS1 gave me problems, I moved onto the next setting. mynext setting also gave me an error highlighted in bold. How can I re-run to bypass this error? > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > library(org.Hs.eg.db) > outputDir <- "NS1crisprseekOutput-guideSize20-NAGorNGG" > inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") > names(inputSeqs) <- "NS1" > results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 3, PAM.pattern = “N[A|G]G$”, outputDir = outputDir, overwrite = TRUE) Error: unexpected input in "tAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.kno" > ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

YJM, Question 1. You should have all the output in the output directory already. For your curiosity, you could try to run one chromosome at a time to see which one is causing the error using the following code. library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") names(inputSeqs) <- "NS1" for (chrom in seqnames(Hsapiens)[1:25]) { outputDir <- paste("NS1crisprseekOutput-guideSize20", chrom, sep = "") results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", chromToSearch = chrom, PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 3, PAM.pattern = "N[A|G]G$", outputDir = outputDir, overwrite = TRUE) } Question 2 The quotation around PAM.pattern is not right. Wrong quotation: PAM.pattern = “N[A|G]G$”. Right quotation: PAM.pattern ="N[A|G]G$" Since you are using the default setting, you do not have to include this in your code. Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>" <reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>> Date: Monday, February 20, 2017 12:15 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92809="">: Hi Julie, Question 1. When I ran NS5 file. I have an error. It is highlighted in bold. How do I re-run and bypass this error? NS5 error > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > library(org.Hs.eg.db) > outputDir <- "NS5crisprseekOutput-guideSize20-NGG" > inputSeqs <- DNAStringSet("ACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGATTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCAAAACCCGAAAAATCAAGACAGT") > names(inputSeqs) <- "NS5" > results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS5", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Validating input ... Searching for gRNAs ... Done annotating Add paired information... Add RE information... write gRNAs to bed file... Scan for REsites in flanking region... Error in .normargStrand(strand, max0123) : strand values must be in '+' '-' '*' In addition: Warning message: In dir.create(outputDir) : 'NS5crisprseekOutput-guideSize20-NGG' already exists Question 2. When I ran NS1 in setting 1, I encountered another error highlighted in bold. How do I re-run and bypass this error? NS1 error > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > library(org.Hs.eg.db) > outputDir <- "NS1crisprseekOutput-guideSize17-NAGorNGG" > inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") > names(inputSeqs) <- "NS1" > results.guideSize17 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 17, PAM = "NGG", max.mismatch = 2, PAM.pattern = “N[A|G]G$”, BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Error: unexpected input in "ortAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size " Thanks. ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

YJM, Here is how to get rid of the error message for your question 1 library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGA GGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAA ATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") names(inputSeqs) <- "NS1" chrom <- seqnames(Hsapiens)[1:25] outputDir <- "NS1crisprseekOutput-guideSize20-mainChrom" results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", chromToSearch = chrom, PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 3, PAM.pattern = "N[A|G]G$", outputDir = outputDir, overwrite = TRUE) If possible, I think you will benefit more by collaborating with local Bioinformaticians or/and attending R/Bioconductor workshops. Best regards, Julie From: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Date: Monday, February 20, 2017 9:09 PM To: "reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>" <reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>> Subject: Re: [bioc] C: running off target analaysis YJM, Question 1. You should have all the output in the output directory already. For your curiosity, you could try to run one chromosome at a time to see which one is causing the error using the following code. library("CRISPRseek") library("BSgenome.Hsapiens.UCSC.hg19") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") names(inputSeqs) <- "NS1" for (chrom in seqnames(Hsapiens)[1:25]) { outputDir <- paste("NS1crisprseekOutput-guideSize20", chrom, sep = "") results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNAoutputName = "NS1", chromToSearch = chrom, PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 3, PAM.pattern = "N[A|G]G$", outputDir = outputDir, overwrite = TRUE) } Question 2 The quotation around PAM.pattern is not right. Wrong quotation: PAM.pattern = “N[A|G]G$”. Right quotation: PAM.pattern ="N[A|G]G$" Since you are using the default setting, you do not have to include this in your code. Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>" <reply+bc3a6b34+code@bioconductor.org<mailto:reply+bc3a6b34+code@bioconductor.org>> Date: Monday, February 20, 2017 12:15 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92809="">: Hi Julie, Question 1. When I ran NS5 file. I have an error. It is highlighted in bold. How do I re-run and bypass this error? NS5 error > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > library(org.Hs.eg.db) > outputDir <- "NS5crisprseekOutput-guideSize20-NGG" > inputSeqs <- DNAStringSet("ACCTGGAGGCACAGCTGCTCCAGGTGCGCGCGGACGCAGAGCGCCAGAACGTGGACCACCAGCGGCTGCTGAATGTCAAGGCCCGCCTGGAGCTGGAGATTGAGACCTACCGCCGCCTGCTGGACGGGGAGGCCCAAGGTGATGGTTTGGAGGAAAGTTTATTTGTGACAGACTCCAAATCACAAGCACAGTCAACTGATTCCTCTAAAGACCCAACCAAAACCCGAAAAATCAAGACAGT") > names(inputSeqs) <- "NS5" > results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 4, PAM.pattern = "NGG", gRNAoutputName = "NS5", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Validating input ... Searching for gRNAs ... Done annotating Add paired information... Add RE information... write gRNAs to bed file... Scan for REsites in flanking region... Error in .normargStrand(strand, max0123) : strand values must be in '+' '-' '*' In addition: Warning message: In dir.create(outputDir) : 'NS5crisprseekOutput-guideSize20-NGG' already exists Question 2. When I ran NS1 in setting 1, I encountered another error highlighted in bold. How do I re-run and bypass this error? NS1 error > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > library(org.Hs.eg.db) > outputDir <- "NS1crisprseekOutput-guideSize17-NAGorNGG" > inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") > names(inputSeqs) <- "NS1" > results.guideSize17 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size = 3, gRNA.size = 17, PAM = "NGG", max.mismatch = 2, PAM.pattern = “N[A|G]G$”, BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Error: unexpected input in "ortAllgRNAs = "fasta", gRNA.size = 17, weights = c( 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508,0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), gRNAoutputName = "NS1", PAM.size " Thanks. ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I successfully run setting 2 by changing the "" but my setting 3 has problems. Can you please advise?

> library("CRISPRseek")
> library("BSgenome.Hsapiens.UCSC.hg19")
> library("TxDb.Hsapiens.UCSC.hg19.knownGene")
> outputDir <- "NS1crisprseekOutput-guideSize20-NAGorNGG"
> inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA")
> names(inputSeqs) <- "NS1"
> results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 3, PAM.pattern = " N[A|G]G$", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE)
Validating input ...
Searching for gRNAs ...
>>> Finding all hits in sequence chr1 ...
>>> DONE searching
>>> Finding all hits in sequence chr2 ...

Building feature vectors for scoring ...
Error in buildFeatureVectorForScoring(hits = hits, canonical.PAM = PAM, :
Empty hits!
In addition: Warning message:
In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern, :
No matching found, please check your input sequence, and make
sure you are using the right genome. You can also alter your
search criteria such as increasing max.mismatch!

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

It means that no offTargets found with mismatch <= 3. It seems that you entered an extra space before the pattern N[A|G]G$ PAM.pattern = " N[A|G]G$". You need to remove the extra space and run again. Please do look at your code carefully. Best, Julie From: "Yap-JM [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> Reply-To: "reply+24c7759e+code@bioconductor.org<mailto:reply+24c7759e+code@bioconductor.org>" <reply+24c7759e+code@bioconductor.org<mailto:reply+24c7759e+code@bioconductor.org>> Date: Tuesday, February 21, 2017 4:48 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>> Subject: [bioc] C: running off target analaysis Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #92887="">: Hi Julie, I successfully run setting 2 by changing the "" but my setting 3 has problems. Can you please advise? > library("CRISPRseek") > library("BSgenome.Hsapiens.UCSC.hg19") > library("TxDb.Hsapiens.UCSC.hg19.knownGene") > outputDir <- "NS1crisprseekOutput-guideSize20-NAGorNGG" > inputSeqs <- DNAStringSet("GTGGAGGTAGCTTTGGAGGGCTGGGGATGGGATTTGGGGGCAGCCCAGGAGGTGGCTCTCTAGGTATTCTCTCGGGCAATGATGGAGGCCTTCTTTCTGGATCAGAAAAAGAAACTATGCAAAATCTTAATGATAGATTAGCTTCCTACCTGGATAAGGTGCGAGCTCTAGAAGAGGCTAATACTGAGCTAGAAAATAAAATTCGAGAATGGTATGAAACACGAGGAACTGGGACTGCAGA") > names(inputSeqs) <- "NS1" > results.guideSize20 <- offTargetAnalysis(inputSeqs, findgRNAs = TRUE, exportAllgRNAs = "fasta", PAM.size = 3, gRNA.size = 20, PAM = "NGG", max.mismatch = 3, PAM.pattern = " N[A|G]G$", gRNAoutputName = "NS1", BSgenomeName = Hsapiens, txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, orgAnn = org.Hs.egSYMBOL, outputDir = outputDir, overwrite = TRUE) Validating input ... Searching for gRNAs ... >>> Finding all hits in sequence chr1 ... >>> DONE searching >>> Finding all hits in sequence chr2 ... Building feature vectors for scoring ... Error in buildFeatureVectorForScoring(hits = hits, canonical.PAM = PAM, : Empty hits! In addition: Warning message: In searchHits2(gRNAs = gRNAs, PAM = PAM, PAM.pattern = PAM.pattern, : No matching found, please check your input sequence, and make sure you are using the right genome. You can also alter your search criteria such as increasing max.mismatch! ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.2 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I continued to run my program this morning but this time I cannot run them as the program gave me an error highligted in bold.

> source ("http://Bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
> biocLite()
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
installation path not writeable, unable to update packages: Matrix, mgcv, nlme,
survival

Would you be able to advise please?

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

Hi Julie,

I got it running again. It is very strange. Yesterday I was able to run it. I retrieved the parameters I used yesterday and it is running now. The parameters yesterday were the same as today. Maybe there is a space or something. I cannot see the difference.

ADD REPLY • link 8.2 years ago Yap-JM • 0

0

Entering edit mode

Dear Julie,

I looked at the off target analysis excel sheet and have an enquiry for you. I see some sequences that fall within exons or introns. There are some sequences which are not labelled inexon or inintron. What does that mean? if they are not in introns or exons, why are they being displayed?

ADD REPLY • link 8.1 years ago Yap-JM • 0

0

Entering edit mode

YJM, Those offTargets locate between genes (intergenic region). Some intergenic DNA acts to control genes nearby or as enhancers. Best, Julie On Feb 25, 2017, at 5:40 AM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #93071="">: Dear Julie, I looked at the off target analysis excel sheet and have an enquiry for you. I see some sequences that fall within exons or introns. There are some sequences which are not labelled inexon or inintron. What does that mean? if they are not in introns or exons, why are they being displayed? ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.1 years ago Julie Zhu ★ 4.3k

0

Entering edit mode

Hi Julie,

I looked at your journal and could not find the information about how large the input fasta file crisprseek can accomodate. Would you be able to tell me how big the input FASTA file can be entered? In the experiment I used 241nts. How big a FASTA file CRISPRseek can accomodate? Many thanks.

ADD REPLY • link 8.1 years ago Yap-JM • 0

0

Entering edit mode

Yap-JM, 200kb nts should run fine. Please refer to scenario 9 in the user guide (https://www.bioconductor.org/packages/release/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf) Best, Julie Sent from my iPhone On Mar 5, 2017, at 4:16 PM, Yap-JM [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Yap-JM<https: support.bioconductor.org="" u="" 12400=""/> wrote Comment: running off target analaysis<https: support.bioconductor.org="" p="" 92745="" #93380="">: Hi Julie, I looked at your journal and could not find the information about how large the input fasta file crisprseek can accomodate. Would you be able to tell me how big the input FASTA file can be entered? In the experiment I used 241nts. How big a FASTA file CRISPRseek can accomodate? Many thanks. ________________________________ Post tags: crisprseek You may reply via email or visit C: running off target analaysis

ADD REPLY • link 8.1 years ago Julie Zhu ★ 4.3k