Hi Katherine,
I suggest set
exportAllgRNAs = "fasta" and
annotatePaired = FALSE in addition to the parameters you set, such as
findPairedgRNAOnly = FALSE,
findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE,
If you are not interested in identifying offTargets for each gRNA, you can set chromToSearch = "" to make it run much faster.
If you need to search for offTargets, I suggest you first run the analysis without searching for offTargets with the above setting, then select gRNAs with reasonable efficiency and run offTarget analysis for the selected gRNAs (section 2.5, 2.9 and 2.10). If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes.
FYI, the most recent version of CRISPRseek implements three different algorithms for calculating gRNA efficiency. Please read section 2.7 for details. Thanks!
http://bioconductor.org/packages/devel/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf#page8
Best regards,
Julie
On Nov 28, 2019, at 4:22 AM, Lucka387 [bioc] <noreply@bioconductor.org<a rel="nofollow" href="mailto:noreply@bioconductor.org">noreply@bioconductor.org> wrote:
Activity on a post you are following on support.bioconductor.orghttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244413553&sdata=KoQTU5S5cDLlWIDbwZQ1R256xfg7kWJgAB%2Bzg5CsxoI%3D&reserved=0
User Lucka387https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fu%2F22466%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=Tqlql%2BC8ztTc5Rykph%2BV5jcJRx4q4rUUpCQ13%2FJCHnE%3D&reserved=0 wrote Question: CRISPRseek help to generate gRNAs for CRISPRi libraryhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fp%2F126760%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=n13ciBWdg%2FaeWkhYXGStbguv76r1Y9RnVUwAHWYe%2B3Y%3D&reserved=0:
I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:
Scenario 7. Quick gRNA finding with gRNA efficacy prediction
results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)
This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.
My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?
Any help or advice would be much appreciated!
Thank you, Kathleen
Thank you very much for the prompt reply. I will rerun with the changed parameters as suggested.
The other issue, which I mentioned, is that it is only using one core, which is odd considering the enable.multicore is set to true. Our system (just one server, not a cluster) has 144 available threads and 1TB RAM. I did find a post elsewhere saying there might be an issue with processing on multicores if there is more than 128 connections support.bioconductor.org/p/72994 and 9th answer down), so am wondering if this is the issue preventing it from running on more than one core. Thoughts? Thank you for your time.
You are welcome, Katherine!
I suggest to set n.cores = 6 to test whether it works as expected.
If you have a file with lots of gRNAs to search for offTargets, it will be more effective to run several searches each with a subset of the gRNAs.
Best regards,
Julie
Hi Julie,
I adjusted my script based on your suggestions, and it worked well for my output but I am still unable to run on multiple cores. This is what I ran:
Scenario 5: Target and off-target analysis for user specified gRNAs
results <- offTargetAnalysis(inputFilePath = gRNAFilePath, enable.multicore = TRUE, n.cores.max = 6, annotateExon = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = "all", txdb = TxDb.Hsapiens.UCSC.hg38.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)
You previously mentioned "If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes." I do have access to HPCC, is there an additional script I should run for multiple core use other than "enable.multicore = TRUE, n.cores.max = 6"?
Thank you! Kathleen
Hi Kathleen,
Here are the scripts offTargetSearchBatch.R and offTargetSearchBatch.bsub, which I used for batch analysis in high-performance computing environment.
After modifying the parameters to fit your own needs, you can run the script in the cluster by typing the following command.
./offTargetSearchBatch.bsub
Hope it helps.
Best,
Julie
#offTargetSearchBatch.bsub
This is the example that submits 51 jobs to the cluster, please change 51 to a larger number if you have more than 51000 gRNAs to search for offtargets
Please change R_LIBS, workingDir, R path and offTargetSearchBatch.R path accordingly
for FILE in {1..51}; do #BASENAME=
basename $FILE
BASENAME=$FILE SHF=$BASENAME.bsub DIR=$BASENAME.output mkdir -p $DIR echo "Processing $FILE ..." echo "#!/bin/bash" > $SHF #echo "module load R/3.1.0" >>$SHF echo "export RLIBS=/project/umwmccb/R/R-3.4.0/lib64/R/library:/share/pkg/R/3.4.0/lib64/R/library:/home/jz57w/R/x86_64-pc-linux-gnu-library/3.4" >> $SHF echo "#BSUB -J $BASENAME" >>$SHF echo "workingDir=~/mccb/Zhu/CRISPR" >>$SHF workingDir=~/mccb/Zhu/CRISPR echo "cd $workingDir" >>$SHF echo "#BSUB -q long" >> $SHF echo "#BSUB -R rusage[mem=20000]" >> $SHF echo "#BSUB -W 48:00" >>$SHF echo "#BSUB -o out.$BASENAME.log" >>$SHF echo "#BSUB -e err.$BASENAME.log" >>$SHF echo "~/mccb/bin/R CMD BATCH --no-save --no-restore '--args $BASENAME' ~/mccb/Zhu/CRISPR/offTargetSearchBatch.R $SHF.log" >> $SHF bsub <$SHF sleep 20 done## offTargetSearchBatch.R
Search for offTargets for 1000 gRNAs at a time
Allow maximum 3 mismatches, please change it accordingly
Please change the BSgenome, Txdb, org, PAM sequence accordingly
Rule set 2 and CRISPRscan have been implemented since this implementation,
please type help(offTargetAnalysis) to set rule.set and other parameters accordingly
library("CRISPRseek")
library("BSgenome.Hsapiens.UCSC.hg19")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
args=commandArgs(trailingOnly = TRUE)
gRNAs <- readDNAStringSet("~/mccb/Zhu/CRISPR/inputSeqallgRNAs.fa")
batch.ind = as.numeric(args[1]) - 1
batch.ind
batch.start <- max(batch.ind * 1000 + 1, 1)
batch.end <- min((batch.ind + 1) * 1000, length(gRNAs))
if (batch.end >= batch.start)
{
}