CRISPRseek help to generate gRNAs for CRISPRi library
1
0
Entering edit mode
Lucka387 • 0
@lucka387-22466
Last seen 5.0 years ago

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

CRISPRseek • 1.3k views
ADD COMMENT
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 14 months ago
United States

Hi Katherine,

I suggest set exportAllgRNAs = "fasta" and annotatePaired = FALSE in addition to the parameters you set, such as

findPairedgRNAOnly = FALSE,

findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE,

If you are not interested in identifying offTargets for each gRNA, you can set chromToSearch = "" to make it run much faster.

If you need to search for offTargets, I suggest you first run the analysis without searching for offTargets with the above setting, then select gRNAs with reasonable efficiency and run offTarget analysis for the selected gRNAs (section 2.5, 2.9 and 2.10). If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes.

FYI, the most recent version of CRISPRseek implements three different algorithms for calculating gRNA efficiency. Please read section 2.7 for details. Thanks!

http://bioconductor.org/packages/devel/bioc/vignettes/CRISPRseek/inst/doc/CRISPRseek.pdf#page8

Best regards,

Julie

On Nov 28, 2019, at 4:22 AM, Lucka387 [bioc] <noreply@bioconductor.org<a rel="nofollow" href="mailto:noreply@bioconductor.org">noreply@bioconductor.org> wrote:

Activity on a post you are following on support.bioconductor.orghttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244413553&sdata=KoQTU5S5cDLlWIDbwZQ1R256xfg7kWJgAB%2Bzg5CsxoI%3D&reserved=0

User Lucka387https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fu%2F22466%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=Tqlql%2BC8ztTc5Rykph%2BV5jcJRx4q4rUUpCQ13%2FJCHnE%3D&reserved=0 wrote Question: CRISPRseek help to generate gRNAs for CRISPRi libraryhttps://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsupport.bioconductor.org%2Fp%2F126760%2F&data=02%7C01%7Cjulie.zhu%40umassmed.edu%7Cacab61bf97f54bc7d5c008d773e46d42%7Cee9155fe2da34378a6c44405faf57b2e%7C0%7C0%7C637105297244423552&sdata=n13ciBWdg%2FaeWkhYXGStbguv76r1Y9RnVUwAHWYe%2B3Y%3D&reserved=0:

I am currently in the process of designing a gRNA library for a CRISPRi screen. I would like to use the CRISPRseek script to identify the top 10 most efficient gRNAs per gene for my library. I ran a test on a small file (128 KB) using the script for Scenario #7: Quick gRNA finding with gRNA efficacy prediction:

Scenario 7. Quick gRNA finding with gRNA efficacy prediction

results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE, enable.multicore=TRUE, n.cores.max=60, annotateExon = FALSE, findPairedgRNAOnly = TRUE, chromToSearch = "all", max.mismatch = 0, BSgenomeName = Hsapiens, outputDir = outputDir, overwrite = TRUE)

This took ~36hrs to run on one core and used a lot of memory. I’ll have a minimum of 24.9MB of input to run, with more data on the way.

My question is if there is a way to edit the script to minimize the time and memory to generate the information I need? I do not need any information on restriction cut sites because CRISPRi does not cut the DNA and I do not necessarily need paired gRNAs unless that is the fastest way for the script to run. From the output of my test file I saw there are many gRNAs generated but if I only need the 10 most efficient per gene- is there a way to select for these to make the script run faster with less memory?

Any help or advice would be much appreciated!

Thank you, Kathleen

ADD COMMENT
0
Entering edit mode

Thank you very much for the prompt reply. I will rerun with the changed parameters as suggested.

The other issue, which I mentioned, is that it is only using one core, which is odd considering the enable.multicore is set to true. Our system (just one server, not a cluster) has 144 available threads and 1TB RAM. I did find a post elsewhere saying there might be an issue with processing on multicores if there is more than 128 connections support.bioconductor.org/p/72994 and 9th answer down), so am wondering if this is the issue preventing it from running on more than one core. Thoughts? Thank you for your time.

ADD REPLY
0
Entering edit mode

You are welcome, Katherine!

I suggest to set n.cores = 6 to test whether it works as expected.

If you have a file with lots of gRNAs to search for offTargets, it will be more effective to run several searches each with a subset of the gRNAs.

Best regards,

Julie

ADD REPLY
0
Entering edit mode

Hi Julie,

I adjusted my script based on your suggestions, and it worked well for my output but I am still unable to run on multiple cores. This is what I ran:

Scenario 5: Target and off-target analysis for user specified gRNAs

results <- offTargetAnalysis(inputFilePath = gRNAFilePath, enable.multicore = TRUE, n.cores.max = 6, annotateExon = FALSE, findgRNAsWithREcutOnly = FALSE, findPairedgRNAOnly = FALSE, findgRNAs = FALSE, BSgenomeName = Hsapiens, chromToSearch = "all", txdb = TxDb.Hsapiens.UCSC.hg38.knownGene, orgAnn = org.Hs.egSYMBOL, max.mismatch = 0, outputDir = outputDir, overwrite = TRUE)

You previously mentioned "If you have access to high performance computing clusters (HPCC), I can share my scripts for you to run the searches in multiple nodes." I do have access to HPCC, is there an additional script I should run for multiple core use other than "enable.multicore = TRUE, n.cores.max = 6"?

Thank you! Kathleen

ADD REPLY
0
Entering edit mode

Hi Kathleen,

Here are the scripts offTargetSearchBatch.R and offTargetSearchBatch.bsub, which I used for batch analysis in high-performance computing environment.

After modifying the parameters to fit your own needs, you can run the script in the cluster by typing the following command.

./offTargetSearchBatch.bsub

Hope it helps.

Best,

Julie

#offTargetSearchBatch.bsub

This is the example that submits 51 jobs to the cluster, please change 51 to a larger number if you have more than 51000 gRNAs to search for offtargets

Please change R_LIBS, workingDir, R path and offTargetSearchBatch.R path accordingly

for FILE in {1..51}; do #BASENAME=basename $FILE BASENAME=$FILE SHF=$BASENAME.bsub DIR=$BASENAME.output mkdir -p $DIR echo "Processing $FILE ..." echo "#!/bin/bash" > $SHF #echo "module load R/3.1.0" >>$SHF echo "export RLIBS=/project/umwmccb/R/R-3.4.0/lib64/R/library:/share/pkg/R/3.4.0/lib64/R/library:/home/jz57w/R/x86_64-pc-linux-gnu-library/3.4" >> $SHF echo "#BSUB -J $BASENAME" >>$SHF echo "workingDir=~/mccb/Zhu/CRISPR" >>$SHF workingDir=~/mccb/Zhu/CRISPR echo "cd $workingDir" >>$SHF echo "#BSUB -q long" >> $SHF echo "#BSUB -R rusage[mem=20000]" >> $SHF echo "#BSUB -W 48:00" >>$SHF echo "#BSUB -o out.$BASENAME.log" >>$SHF echo "#BSUB -e err.$BASENAME.log" >>$SHF echo "~/mccb/bin/R CMD BATCH --no-save --no-restore '--args $BASENAME' ~/mccb/Zhu/CRISPR/offTargetSearchBatch.R $SHF.log" >> $SHF bsub <$SHF sleep 20 done

## offTargetSearchBatch.R
Search for offTargets for 1000 gRNAs at a time
Allow maximum 3 mismatches, please change it accordingly
Please change the BSgenome, Txdb, org, PAM sequence accordingly
Rule set 2 and CRISPRscan have been implemented since this implementation,
please type help(offTargetAnalysis) to set rule.set and other parameters accordingly

library("CRISPRseek")

library("BSgenome.Hsapiens.UCSC.hg19")

library(TxDb.Hsapiens.UCSC.hg19.knownGene)

library(org.Hs.eg.db)

args=commandArgs(trailingOnly = TRUE)

gRNAs <- readDNAStringSet("~/mccb/Zhu/CRISPR/inputSeqallgRNAs.fa")

batch.ind = as.numeric(args[1]) - 1

batch.ind

batch.start <- max(batch.ind * 1000 + 1, 1)

batch.end <- min((batch.ind + 1) * 1000, length(gRNAs))

if (batch.end >= batch.start)

{

inputFilePath <- gRNAs[batch.start:batch.end]



setwd("~/mccb/Zhu/CRISPR/")

outputDir <- paste("~/mccb/Zhu/CRISPR/output", batch.ind, sep="")

results <- offTargetAnalysis(inputFilePath,

   findgRNAsWithREcutOnly = FALSE,

   findPairedgRNAOnly = FALSE,

   gRNAoutputName = paste("gRNAs", batch.ind, sep=""),

   PAM = "NNN",

   annotatePaired = FALSE,

   findgRNAs = FALSE,

   BSgenomeName = Hsapiens,

   annotateExon = FALSE,

   exportAllgRNAs =  "fasta",

   scoring.method = "CFDscore",

   fetchSequence = FALSE,

   txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,

   orgAnn = org.Hs.egSYMBOL, max.mismatch = 3,

   outputDir = outputDir, overwrite = TRUE)

}

ADD REPLY

Login before adding your answer.

Traffic: 613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6