I am working with the CRISPRseek R package and I am unsure on the meaning of the off target scores.
Please could you clarify how the off-target score is calculated for the summary output (top 5 off target total score) and whether a higher or lower score is better (in terms of having lower chance of off-target effects).
FYI, I have updated the CRISPRseek package (Version: 1.17.5) to handle situations when the input gRNAs do not have any perfect match in the searching genome. I tested the package with your example code and gRNAs. Please let me know how it works out for you. Thanks for the feedback!
Dawid,
Please try to use the following command to download the most recent version.
git clone git@git.bioconductor.org:packages/CRISPRseek
Best regards,
Julie
From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org>
Reply-To: "reply+0df74aaa+code@bioconductor.org" <reply+0df74aaa+code@bioconductor.org>
Date: Thursday, October 19, 2017 at 4:39 PM
To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu>
Subject: [bioc] C: CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101857="">:
Julie,
Thank you for your help. Is 1.17.5 version available as a devel version on Bioconductor? I can only see 1.17.3 (from 2017-09-05).
Thanks,
Dawid
________________________________
Post tags: crisprseek
You may reply via email or visit C: CRISPRseek scoring clarification
I noticed that OffTarget file doesn't have columns: inExon, inIntron, entrez_id, gene. Is there any particular reason to skip them? Information coming from these columns is still interesting when you design non-targeting guide (negative/scramble control) to see if potential OffTargets are in exon, intron etc. What do you think?
Dawid,
Glad that it works for you.
If you set annotateExon = TRUE, txdb and orgAnn, then you should get the annotation information. I removed them for speeding up the test.
BTW, I added your example as one of the integration tests in CRISPRseek. Hope it is all right with you.
FYI, I changed the value for top1Hit.onTarget.MMdistance2PAM to “perfect match not found” in the summary output when there is no on-target found. I am running the integration tests. I will commit the changes later with version 1.17.6.
Best,
Julie
Best,
Julie
From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org>
Reply-To: "reply+e3e084e4+code@bioconductor.org" <reply+e3e084e4+code@bioconductor.org>
Date: Thursday, October 19, 2017 at 9:54 PM
To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu>
Subject: [bioc] C: CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101873="">:
Hi Julie,
Thanks it works!
I noticed that OffTarget file doesn't have columns: inExon, inIntron, entrez_id, gene. Is there any particular reason to skip them? Information coming from these columns is still interesting when you design non-targeting guide (negative/scramble control) to see if potential OffTargets are in exon, intron etc. What do you think?
Best regards,
Dawid
________________________________
Post tags: crisprseek
You may reply via email or visit C: CRISPRseek scoring clarification
I am still a little confused about the scoring. When you say the lower topN.offtargettotalscore, the better.
I saw people got score from the zhang website(https://zlab.bio/guide-design-resources) which shut down recently. They exclude sgRNAs with score <0.2 . In this practice, score 1 means no off target while you suggest in the opposite way.
So does it means the score in OfftargetAnalysis.xls file is the opposite way of the score get from zhang website?
I`m currently trying to find a cuttoff to filter out my libraries. Is there any resonable advice of a cutoff in terms of the topN.offtargettotalscore?
In short, the score from MIT is calculated as,
100/( 100 + [CRISPRseek top100OfftargetTotalScore ])
If CRISPRseek top100OfftargetTotalScore = 10, then the MIT score would be 100/(100+ 10) = 90.9.
Best regards,
Julie
I am a little confused about the score where you mentioned "NA" means no off target found, but I still get a 0 score in some cases. What is the difference between score 0 and "NA"?
Could you please look at one of the output files offTargets.xls to compare the two gRNAs and their offTargets to see if there are any differences in terms of their offTargets? If it is still hard to distinguish these two cases, could you please post the two gRNA sequences and the code snippets to run offTargets analysis including loading the required libraries and sessionInfo()?
REpatternFile <- system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek") #loading restriction enzyme site pattern
scoring_method <- "CFDscore" #scoring method
core <- 20 # number of cores to run the job
FYI, CFD score of 1 means perfect match. Is it corret that the following two are the on-targets instead of off-targets? Do you find any off-targets for these two gRNAs? Thanks!
I just ran your testing code and there is no offtarget found for either of the two gRNAs allowing at most 1 mismatch, and the topNOffftargetTotalSore is NA for both gRNAs in the summary.xls, detailed below.
Elspeth,
CRISPRseek adds the topN offtarget scores per user's choice of N (parameter topN.OfftargetTotalScore, default 10 )and outputs the topN.OfftargetTotalScore in the Summary.xls file, e.g., top10.OfftargetTotalScore or top50.OfftargetTotalScore. The rational is that
the top off-targets are the most critical ones. The lower topN.OfftargetTotalScore is, the better. For detailed information on individual offtarget score, please look at the OfftargetAnalysis.xls file.
BTW, if you are working on spCas9 (gRNA length 20 and PAM = "NGG"), then you can set
scoring.method = "CFDscore" to use an improved off target scoring algorithm which ranges from 0 to 1 instead of 0-100 for the Hsu-Zhang method.
For detailed parameter setting and references, please type help(offTargetAnalysis) in a R session.
Best,
Julie
From: "elspeth.ransom [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>>
Reply-To: "reply+8cb73ff9+code@bioconductor.org<mailto:reply+8cb73ff9+code@bioconductor.org>" <reply+8cb73ff9+code@bioconductor.org<mailto:reply+8cb73ff9+code@bioconductor.org>>
Date: Friday, February 24, 2017 12:32 PM
To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>
Subject: [bioc] CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User elspeth.ransom<https: support.bioconductor.org="" u="" 12442=""/> wrote Question: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049=""/>:
I am working with the CRISPRseek R package and I am unsure on the meaning of the off target scores.
Please could you clarify how the off-target score is calculated for the summary output (top 5 off target total score) and whether a higher or lower score is better (in terms of having lower chance of off-target effects).
Many Thanks!
________________________________
Post tags: crisprseek
You may reply via email or visit CRISPRseek scoring clarification
Elspeth,
This post might be helpful to you CRISPRseek question on scoring
Best,
Julie
From: "Julie Zhu [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org>>
Reply-To: "reply+b0d33c38+code@bioconductor.org<mailto:reply+b0d33c38+code@bioconductor.org>" <reply+b0d33c38+code@bioconductor.org<mailto:reply+b0d33c38+code@bioconductor.org>>
Date: Friday, February 24, 2017 1:00 PM
To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>>
Subject: [bioc] A: CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User Julie Zhu<https: support.bioconductor.org="" u="" 3596=""/> wrote Answer: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #93052="">:
Elspeth, CRISPRseek adds the topN offtarget scores per user's choice of N (parameter topN.OfftargetTotalScore, default 10 )and outputs the topN.OfftargetTotalScore in the Summary.xls file, e.g., top10.OfftargetTotalScore or top50.OfftargetTotalScore. The rational is that the top off-targets are the most critical ones. The lower topN.OfftargetTotalScore is, the better. For detailed information on individual offtarget score, please look at the OfftargetAnalysis.xls file. BTW, if you are working on spCas9 (gRNA length 20 and PAM = "NGG"), then you can set scoring.method = "CFDscore" to use an improved off target scoring algorithm which ranges from 0 to 1 instead of 0-100 for the Hsu-Zhang method. For detailed parameter setting and references, please type help(offTargetAnalysis) in a R session. Best, Julie From: "elspeth.ransom [bioc]" <noreply@bioconductor.org<mailto:noreply@bioconductor.org><mailto:noreply@bioconductor.org>> Reply-To: "reply+8cb73ff9+code@bioconductor.org<mailto:reply+8cb73ff9+code@bioconductor.org><mailto:reply+8cb73ff9+code@bioconductor.org>" <reply+8cb73ff9+code@bioconductor.org<mailto:reply+8cb73ff9+code@bioconductor.org><mailto:reply+8cb73ff9+code@bioconductor.org>> Date: Friday, February 24, 2017 12:32 PM To: Lihua Julie Zhu <julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu><mailto:julie.zhu@umassmed.edu>> Subject: [bioc] CRISPRseek scoring clarification Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User elspeth.ransom<https: support.bioconductor.org="" u="" 12442=""/> wrote Question: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049=""/>: I am working with the CRISPRseek R package and I am unsure on the meaning of the off target scores. Please could you clarify how the off-target score is calculated for the summary output (top 5 off target total score) and whether a higher or lower score is better (in terms of having lower chance of off-target effects). Many Thanks! ________________________________ Post tags: crisprseek You may reply via email or visit CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049=""/>
________________________________
Post tags: crisprseek
You may reply via email or visit A: CRISPRseek scoring clarification
I have a question about "CFDscore". My understanding is that top50.OfftargetTotalScore is calculated by adding adding topN scores together.
I can see my top50.OfftargetTotalScore/top100.OfftargetTotalScore is NA (which I assume no offTargets?). When I look for offTarget at guide with the top10 score NA I can see a a "CFD score" calculated for only one site and I see a number i.e. 0.2. I would assume that top10 would be still calculated but would only contain 0.2 even if there are no other values?
### my code below, I test a set of mine guides
offTargetAnalysis(inputFilePath,
REpatternFile = REpatternFile,
scoring.method = "CFDscore",
format = "fasta",
findgRNAs = FALSE, # important for testing to set FALSE
findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
findPairedgRNAOnly = FALSE,
gRNA.name.prefix = "sg.",
orgAnn = orgAnn,
BSgenomeName = BSgenomeName,
txdb = txdb,
chromToSearch= "all", # change here for all to look at all chromosomes
min.gap = 0, max.gap = 20,
max.mismatch = 3,
min.score = 0.1,
topN = 100,
topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated
annotateExon = TRUE,
fetchSequence = TRUE, upstream = 250, downstream = 250,
overlap.gRNA.positions = c(17, 18),
PAM.size = 3,
PAM = "NGG",
gRNA.size = 20,
outputDir = outputDir,
overwrite = TRUE)
Hi Dawid,
Is this a unique problem with scoring.method = "CFDscore"? Thanks!
Best regards,
Julie
From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org>
Reply-To: "reply+c0630670+code@bioconductor.org" <reply+c0630670+code@bioconductor.org>
Date: Tuesday, October 17, 2017 at 1:36 PM
To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu>
Subject: [bioc] C: CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101727="">:
Hi Julie,
I have a question about "CFDscore". My understanding is that top50.OfftargetTotalScore is calculated by adding adding topN scores together.
I can see my top50.OfftargetTotalScore/top100.OfftargetTotalScore is NA (which I assume no offTargets?). When I look for offTarget at guide with the top10 score NA I can see a a "CFD score" calculated for only one site and I see a number i.e. 0.2. I would assume that top10 would be still calculated but would only contain 0.2 even if there are no other values?
### my code below, I test a set of mine guides
offTargetAnalysis(inputFilePath,
REpatternFile = REpatternFile,
scoring.method = "CFDscore",
format = "fasta",
findgRNAs = FALSE, # important for testing to set FALSE
findgRNAsWithREcutOnly = FALSE, # if FALSE not restr. enzymes
findPairedgRNAOnly = FALSE,
gRNA.name.prefix = "sg.",
orgAnn = orgAnn,
BSgenomeName = BSgenomeName,
txdb = txdb,
chromToSearch= "all", # change here for all to look at all chromosomes
min.gap = 0, max.gap = 20,
max.mismatch = 3,
min.score = 0.1,
topN = 100,
topN.OfftargetTotalScore= 10, # 10 top Offtarget will be calculated
annotateExon = TRUE,
fetchSequence = TRUE, upstream = 250, downstream = 250,
overlap.gRNA.positions = c(17, 18),
PAM.size = 3,
PAM = "NGG",
gRNA.size = 20,
outputDir = outputDir,
overwrite = TRUE)
________________________________
Post tags: crisprseek
You may reply via email or visit C: CRISPRseek scoring clarification
I just tested both scoring methods and I see this situation with NA in both cases. I made some comments below, I started to notice them when I test target and off-target analysis for my specified gRNAs. I use CRISPRseek 1.16.0
1) When I assign x <- offTargetAnalysis(), then I can see x$summary but I cannot see anything in Summary.xls created by the package.
2) I also noticed a discrepancy between TopN score calculated in Summary and OfftargetAnalysis., i.e. if I take top5 hits from OfftargetAnalysis and sum up I see a different number than in Summary.
Thanks,
Dawid
# below example of guide that gave me NA scores but scores where still calculated
Dawid,
Thanks for testing both scoring methods! It might be an issue with data type. Could you please send me the testing input file, the code and the output?My email is Julie.zhu@umassmed.edu<mailto:julie.zhu@umassmed.edu>. Thanks!
Best regards,
Julie
From: "Dawid G. Nowak [bioc]" <noreply@bioconductor.org>
Reply-To: "reply+0ad43044+code@bioconductor.org" <reply+0ad43044+code@bioconductor.org>
Date: Tuesday, October 17, 2017 at 4:46 PM
To: "Zhu, Lihua (Julie)" <julie.zhu@umassmed.edu>
Subject: [bioc] C: CRISPRseek scoring clarification
Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org="">
User Dawid G. Nowak<https: support.bioconductor.org="" u="" 6790=""/> wrote Comment: CRISPRseek scoring clarification<https: support.bioconductor.org="" p="" 93049="" #101731="">:
Hi,
I just tested both scoring methods and I see this situation with NA in both cases. I made some comments below, I started to notice them when I test target and off-target analysis for my specified gRNAs. I use CRISPRseek 1.16.0
1) When I assign x <- offTargetAnalysis(), then I can see x$summary but I cannot see anything in Summary.xls created by the package.
2) I also noticed a discrepancy between TopN score calculated in Summary and OfftargetAnalysis., i.e. if I take top5 hits from OfftargetAnalysis and sum up I see a different number than in Summary.
Thanks,
Dawid
# below example of guide that gave me NA scores but scores where still calculated
>sg.test2
GACCGGAACGATCTCGCGTANGG
________________________________
Post tags: crisprseek
You may reply via email or visit C: CRISPRseek scoring clarification
Dawid,
Thanks for the test script and input!
When there is no on-target found for any input gRNAs, summary.xls file will be empty. Also, the topN score is calculated assuming the on-target is present for the gRNA (sum(2nd, Nth))
I am updating the dev code to handle this exception and will post an update in the support site.
Thanks again for reporting the issue!
Best,
Julie
I noticed recently for couple tests that I am getting “NA" for top5OfftargetTotalScore or top10OfftargetTotalScore. I tested guides with NA values for top5OfftargetTotalScore with different online tools and it showed me very low off-target risks. When there is no off-target found for any input gRNAs, summary.xls file will be NA because (sum(2nd=NA, Nth=NA))?
Julie,
Thank you for your help. Is 1.17.5 version available as a devel version on Bioconductor? I can only see 1.17.3 (from 2017-09-05).
Thanks,
Dawid
Hi Julie,
Thanks it works!
I noticed that OffTarget file doesn't have columns: inExon, inIntron, entrez_id, gene. Is there any particular reason to skip them? Information coming from these columns is still interesting when you design non-targeting guide (negative/scramble control) to see if potential OffTargets are in exon, intron etc. What do you think?
Best regards,
Dawid
Thanks, I am glad my comments can help with the package!
Dawid
Hi,
I am still a little confused about the scoring. When you say the lower topN.offtargettotalscore, the better.
I saw people got score from the zhang website(https://zlab.bio/guide-design-resources) which shut down recently. They exclude sgRNAs with score <0.2 . In this practice, score 1 means no off target while you suggest in the opposite way.
So does it means the score in OfftargetAnalysis.xls file is the opposite way of the score get from zhang website?
I`m currently trying to find a cuttoff to filter out my libraries. Is there any resonable advice of a cutoff in terms of the topN.offtargettotalscore?
Thanks for your help. Nan
Nan, A great question! Please see my response at https://support.bioconductor.org/p/61007 and upvote it if it is helpful.
In short, the score from MIT is calculated as, 100/( 100 + [CRISPRseek top100OfftargetTotalScore ]) If CRISPRseek top100OfftargetTotalScore = 10, then the MIT score would be 100/(100+ 10) = 90.9. Best regards, Julie
Thank you, Julie. I actually read that before. How about the CFD score? Is it the same formula?
Best, Nan
You are welcome, Nan!
Yes, CFD score is the same!
Best regards,
Julie
Hi Julie,
I am a little confused about the score where you mentioned "NA" means no off target found, but I still get a 0 score in some cases. What is the difference between score 0 and "NA"?
Thanks for your help, Nan
Nan,
A great question!
Could you please look at one of the output files offTargets.xls to compare the two gRNAs and their offTargets to see if there are any differences in terms of their offTargets? If it is still hard to distinguish these two cases, could you please post the two gRNA sequences and the code snippets to run offTargets analysis including loading the required libraries and sessionInfo()?
Thanks!
Best regards, Julie
Hi Julie,
Thanks for such a quick response. In the offTargets.xls file, I can find that both sgRNAs have OffTargetSequence and same score as 1.
My R code is here:
Here is R sessionInfo:
Thanks for your help, Nan
Nan,
Thanks for posting the code and gRNAs!
FYI, CFD score of 1 means perfect match. Is it corret that the following two are the on-targets instead of off-targets? Do you find any off-targets for these two gRNAs? Thanks!
I just ran your testing code and there is no offtarget found for either of the two gRNAs allowing at most 1 mismatch, and the topNOffftargetTotalSore is NA for both gRNAs in the summary.xls, detailed below.
names forViewInUCSC extendedSequence gRNAefficacy gRNAsPlusPAM top5OfftargetTotalScore top100OfftargetTotalScore top1Hit.onTarget.MMdistance2PAM topOfftarget1MMdistance2PAM topOfftarget2MMdistance2PAM topOfftarget3MMdistance2PAM topOfftarget4MMdistance2PAM topOfftarget5MMdistance2PAM topOfftarget6MMdistance2PAM topOfftarget7MMdistance2PAM topOfftarget8MMdistance2PAM topOfftarget9MMdistance2PAM topOfftarget10MMdistance2PAM REname uniqREin200 uniqREin100
g2TTCCTGGCCGGCTAAGGAGC chr3:65659356-65659378 AAACTTCCTGGCCGGCTAAGGAGCAGGGCA 0.023752557 TTCCTGGCCGGCTAAGGAGCNGG NA NA NMM
g1GTTCTCTTTTGCCTGATTCC chr9:106178822-106178844 CTCCGTTCTCTTTTGCCTGATTCCAGGCTG 0.092043552 GTTCTCTTTTGCCTGATTCCNGG NA NA NMM HinfI TfiI HinfI TfiI HinfI TfiI
Best, Julie
Hi Julie,
Thanks for your time and help.
Best, Nan