Option to override removal of duplicate reads - GUIDEseq
2
0
Entering edit mode
@sharvari-gujja-6614
Last seen 21 months ago
United States

Hi Julie and Kai,

I am testing the GUIDEseq package on the modified GUIDE-seq protocol. While running the analysis on a new data set, I was looking into a specific off target that was being predicted by another program, but not by the GUIDEseq method. This sample was generated after concatenating the reads from other samples, and so I wondered if the reads are being filtered out as duplicates. If so then is there a way to override this option? Please see the code below for your reference.

Appreciate all the help.

GUIDEseqAnalysis(alignment.inputfile = bamfile , umi.inputfile = umifile,
                        alignment.format = c("bam"),
                        BSgenomeName = Hsapiens,
                        gRNA.file = gRNAs, n.cores.max = 4,min.mapping.quality = 15L,max.R1.len = 150L, max.R2.len =150L,
                        min.reads = 2,min.SNratio = 2, maxP = 0.01,window.size = 25L,step = 25L,
                        plus.strand.start.gt.minus.strand.end = FALSE,distance.threshold = 1000,
                        upstream = 50L, downstream = 50L, PAM.size = 3, gRNA.size = 20,
                        PAM = "NGG", PAM.pattern = "(NGG)$", max.mismatch = 6,
                        outputDir = outputDir,
                        orderOfftargetsBy = "predicted_cleavage_score",
                        allowed.mismatch.PAM = 2, overwrite = TRUE)
Remove duplicate reads ...
GUIDEseq • 1.2k views
ADD COMMENT
0
Entering edit mode

Hi Julie,

Thank you for the reply.

To run the analysis in a less stringent mode, I've tried the parameters (see below), however the off-target that's predicted in the individual sample is not present in the concatenated sample. keepPeaksInBothStrandsOnly = FALSE and min.peak.score.1strandOnly=1L

There's no option for min.read.coverage and min.umi.count in the version of the package I installed. Can you please let me know if there's an option to keep duplicate reads?

Thank you for all the help.

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0                  
 [3] rtracklayer_1.54.0                Biostrings_2.62.0                
 [5] XVector_0.34.0                    GUIDEseq_1.24.0                  
 [7] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
 [9] IRanges_2.28.0                    S4Vectors_0.32.4                 
[11] BiocGenerics_0.40.0              

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3            seqinr_4.2-16               rjson_0.2.21               
  [4] ellipsis_0.3.2              futile.logger_1.4.3         base64enc_0.1-3            
  [7] rstudioapi_0.13             ChIPpeakAnno_3.28.1         hash_2.2.6.2               
 [10] bit64_4.0.5                 AnnotationDbi_1.56.2        fansi_1.0.3                
 [13] xml2_1.3.3                  splines_4.1.3               cachem_1.0.6               
 [16] zeallot_0.1.0               ade4_1.7-19                 jsonlite_1.8.0             
 [19] Rsamtools_2.10.0            dbplyr_2.2.0                png_0.1-7                  
 [22] tfruns_1.5.0                graph_1.72.0                compiler_4.1.3             
 [25] httr_1.4.3                  assertthat_0.2.1            Matrix_1.4-1               
 [28] fastmap_1.1.0               lazyeval_0.2.2              limma_3.50.3               
 [31] cli_3.3.0                   formatR_1.12                prettyunits_1.1.1          
 [34] tools_4.1.3                 gtable_0.3.0                glue_1.6.2                 
 [37] GenomeInfoDbData_1.2.7      dplyr_1.0.9                 mltools_0.3.5              
 [40] rappdirs_0.3.3              Rcpp_1.0.8.3                Biobase_2.54.0             
 [43] vctrs_0.4.1                 rhdf5filters_1.6.0          multtest_2.50.0            
 [46] stringr_1.4.0               lifecycle_1.0.1             restfulr_0.0.14            
 [49] ensembldb_2.18.4            XML_3.99-0.9                InteractionSet_1.22.0      
 [52] zlibbioc_1.40.0             MASS_7.3-57                 scales_1.2.0               
 [55] hms_1.1.1                   MatrixGenerics_1.6.0        ProtGenerics_1.26.0        
 [58] parallel_4.1.3              SummarizedExperiment_1.24.0 RBGL_1.70.0                
 [61] rhdf5_2.38.1                AnnotationFilter_1.18.0     lambda.r_1.2.4             
 [64] yaml_2.3.5                  curl_4.3.2                  memoise_2.0.1              
 [67] reticulate_1.25             ggplot2_3.3.6               keras_2.9.0                
 [70] biomaRt_2.50.3              stringi_1.7.6               RSQLite_2.2.14             
 [73] tensorflow_2.9.0            BiocIO_1.4.0                GenomicFeatures_1.46.5     
 [76] filelock_1.0.2              BiocParallel_1.28.3         rlang_1.0.2                
 [79] pkgconfig_2.0.3             matrixStats_0.62.0          bitops_1.0-7               
 [82] lattice_0.20-45             purrr_0.3.4                 Rhdf5lib_1.16.0            
 [85] GenomicAlignments_1.30.0    bit_4.0.4                   tidyselect_1.1.2           
 [88] magrittr_2.0.3              R6_2.5.1                    generics_0.1.2             
 [91] DelayedArray_0.20.0         DBI_1.1.2                   pillar_1.7.0               
 [94] whisker_0.4                 survival_3.3-1              KEGGREST_1.34.0            
 [97] RCurl_1.98-1.6              tibble_3.1.7                crayon_1.5.1               
[100] futile.options_1.0.1        utf8_1.2.2                  BiocFileCache_2.2.1        
[103] progress_1.2.2              grid_4.1.3                  data.table_1.14.2          
[106] blob_1.2.3                  digest_0.6.29               VennDiagram_1.7.3          
[109] regioneR_1.26.1             CRISPRseek_1.34.2           munsell_0.5.0
ADD REPLY
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Hi Sharvari,

To keep a record on what we communicated via email, I am stating here again that you can set the following parameters to be less stringent. To tune other parameters, please type ?GUIDEseqAnalysis in a R session.

PAM.pattern = "NNN$", max.mismatch = 10, min.reads = 1, min.read.coverage = 1L, keepPeaksInBothStrandsOnly = FALSE, min.umi.count = 1

As you have noticed, some of the parameters are not available in old versions. To download the most recent version, please type the following commands in a R session. install.packages('remotes') remotes::install_github(repo = 'lihuajuliezhu/guideseq'')

If you are interested in detecting offTargets with bulges, you can specify includeBulge = TRUE (default to FALSE) and the maximum allowed bulge size as max.n.bulge in GUIDEseqAnalysis.

In addition, it is important to use the right build of the BSgenome for offtarget annotation. For example, if you mapped your GUIDEseq data to hg19, then you will need to use BSgenome.Hsapiens.UCSC.hg19, and if you mapped your GUIDEseq data to hg38, then you will need to use BSgenome.Hsapiens.UCSC.hg38 for GUIDEseqAnalysis.

If you have data from biological replicates, you do not need to concatenate the alignment files. Please see details at GUIDESeq analysis - handling biological and technical replicates

Best regards,

Julie

ADD COMMENT
0
Entering edit mode

Hi Julie,

Thank you for the reply.

I installed the newer version and re-ran the function with the parameters you suggested:

GUIDEseqAnalysis(alignment.inputfile = bamfile , umi.inputfile = umifile,
                 alignment.format = c("bam"),
                 BSgenomeName = Hsapiens,
                 gRNA.file = gRNAs, n.cores.max = 4,min.mapping.quality = 15L,max.R1.len = 150L, max.R2.len =150L,
                 min.reads = 1,min.SNratio = 2, min.read.coverage = 1L,min.umi.count = 1,
                 maxP = 0.01,window.size = 25L,step = 25L,
                 plus.strand.start.gt.minus.strand.end = FALSE,distance.threshold = 1000,
                 upstream = 50L, downstream = 50L, PAM.size = 3, gRNA.size = 20,
                 PAM = "NGG", PAM.pattern = "(NGG)$", max.mismatch = 6,
                 outputDir = outputDir,
                 orderOfftargetsBy = "predicted_cleavage_score",
                 allowed.mismatch.PAM = 2, overwrite = TRUE,keepPeaksInBothStrandsOnly = FALSE)

However, a specific off-target of interest is still not being detected. May be the removal of duplicate reads is resulting in no reads in that locus due to the technical samples being merged. Is there a way to run the tool without removing duplicate reads for testing?

Thanks Sharvari

ADD REPLY
0
Entering edit mode

Hi Sharvari,

With two Biological replicates, you will need to specify two separate BAM files for alignment.format, so that reads with the same coordinates and UMI from different Biological replicates will not be collapsed.

Could you please check the files ending with ReadSummary.xls and PlusMinusPeaksMerged.xls in the output folder to see if your offtarget is in one or both files? The *ReadSummary.xls file should include any alignment with one read that passed your quality threshold.

Best, Julie

ADD REPLY
0
Entering edit mode

Hi Julie,

I checked both the files, and don't see the off-target in either one. Is there any other parameter that I could change to not filter out these alignments?

Thanks Sharvari

ADD REPLY
0
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 13 months ago
United States

Hi Sharvari,

Can you double check whether the alignment files do contain the offtarget by using samtools view? If you can see the offtargets with good quality in the bam files using samtools, then I will need your input files (BAM files and UMI files) to try to replicate your findings and go from there.

Best,

Julie

ADD COMMENT

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6