Question

how to analysis chip-seq data

0

Entering edit mode

Bioinformatics ▴ 30

@bioinformatics-10931

Last seen 3.3 years ago

United States

I am using your R code to analysis some Chip_seq data. I have 6 fastq files You have two files per samples in your example while I have only one

FileName1 FileName2

so I amended the file as with 1 sample and I can read all

I am sure I have the data in the folder but I cannot figure out why it gives me an error

I attached my target file

Let me know your thoughts

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 
targetpath <- "~/Desktop/data/targetsPE_chip.txt"
targets <- read.delim("targetsPE_chip.txt", comment.char = "#")
dir_path <- system.file("extdata/cwl/preprocessReads/trim-pe",package = "systemPipeR")
trim <- loadWF(targets = targetpath, wf_file = "trim-pe.cwl",
               input_file = "trim-pe.yml", dir_path = dir_path)
trim <- renderWF(trim, inputvars = c(FileName1 = "_FASTQ_PATH1_", SampleName = "_SampleName_"))
trim
output(trim)[1:2]
filterFct <- function(fq, cutoff = 20, Nexceptions = 0) {
  qcount <- rowSums(as(quality(fq), "matrix") <= cutoff, na.rm = TRUE)
  fq[qcount <= Nexceptions]
  # Retains reads where Phred scores are >= cutoff with N
  # exceptions
}


but when I invoke the following command, it always gives me error 

preprocessReads(args = trim, Fct = "filterFct(fq, cutoff=20, Nexceptions=0)",
                batchsize = 1e+05)

Error in open.connection(con, "rb") : cannot open the connection
In addition: Warning messages:
1: In normalizePath(subset_input[[i]][["FileName1"]]) :
  path[1]="/Users/admin/Desktop/data/S1_R1_001.fastq.gz ": No such file or directory
2: In open.connection(con, "rb") :
  cannot open file '/Users/admin/Desktop/data/S1_R1_001.fastq.gz ': No such file or directory

sessionInfo( )
> sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] systemPipeR_1.22.0          ShortRead_1.46.0            GenomicAlignments_1.26.0   
 [4] SummarizedExperiment_1.20.0 Biobase_2.50.0              MatrixGenerics_1.2.1       
 [7] matrixStats_0.61.0          BiocParallel_1.28.3         Rsamtools_2.6.0            
[10] Biostrings_2.58.0           XVector_0.30.0              GenomicRanges_1.42.0       
[13] GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1           
[16] BiocGenerics_0.36.1        

loaded via a namespace (and not attached):
  [1] colorspace_2.0-2         rjson_0.2.20             hwriter_1.3.2            ellipsis_0.3.2          
  [5] rstudioapi_0.13          bit64_4.0.5              AnnotationDbi_1.52.0     fansi_0.5.0             
  [9] xml2_1.3.3               splines_4.0.5            cachem_1.0.6             jsonlite_1.7.2          
 [13] annotate_1.68.0          GO.db_3.11.4             dbplyr_2.1.1             png_0.1-7               
 [17] pheatmap_1.0.12          graph_1.66.0             compiler_4.0.5           httr_1.4.2              
 [21] GOstats_2.54.0           backports_1.4.1          assertthat_0.2.1         Matrix_1.4-0            
 [25] fastmap_1.1.0            limma_3.46.0             prettyunits_1.1.1        tools_4.0.5             
 [29] gtable_0.3.0             glue_1.6.0               GenomeInfoDbData_1.2.4   Category_2.54.0         
 [33] dplyr_1.0.7              rsvg_2.1.2               batchtools_0.9.15        rappdirs_0.3.3          
 [37] V8_4.0.0                 Rcpp_1.0.7               vctrs_0.3.8              rtracklayer_1.54.0      
 [41] stringr_1.4.0            lifecycle_1.0.1          restfulr_0.0.13          XML_3.99-0.8            
 [45] edgeR_3.32.1             zlibbioc_1.36.0          scales_1.1.1             BSgenome_1.58.0         
 [49] VariantAnnotation_1.36.0 hms_1.1.1                RBGL_1.64.0              RColorBrewer_1.1-2      
 [53] yaml_2.2.1               curl_4.3.2               memoise_2.0.1            ggplot2_3.3.5           
 [57] biomaRt_2.46.3           latticeExtra_0.6-29      stringi_1.7.6            RSQLite_2.2.9           
 [61] genefilter_1.72.1        BiocIO_1.0.1             checkmate_2.0.0          GenomicFeatures_1.46.2  
 [65] DOT_0.1                  rlang_0.4.12             pkgconfig_2.0.3          bitops_1.0-7            
 [69] lattice_0.20-45          purrr_0.3.4              bit_4.0.4                tidyselect_1.1.1        
 [73] GSEABase_1.50.1          AnnotationForge_1.30.1   magrittr_2.0.1           R6_2.5.1                
 [77] generics_0.1.1           base64url_1.4            DelayedArray_0.16.3      DBI_1.1.2               
 [81] pillar_1.6.4             withr_2.4.3              survival_3.2-13          RCurl_1.98-1.5          
 [85] tibble_3.1.6             crayon_1.4.2             utf8_1.2.2               BiocFileCache_1.14.0    
 [89] jpeg_0.1-9               progress_1.2.2           locfit_1.5-9.4           grid_4.0.5              
 [93] data.table_1.14.2        blob_1.2.2               Rgraphviz_2.32.0         digest_0.6.29           
 [97] xtable_1.8-4             brew_1.0-6               openssl_1.4.6            munsell_0.5.0           
[101] askpass_1.1             
>

systemPipeR • 1.2k views

ADD COMMENT • link updated 3.3 years ago by dcassol ▴ 100 • written 3.3 years ago by Bioinformatics ▴ 30

score 0 · Answer 1 · 2021-12-23

Hi Mohammad,

I already replied by email, but I will add the answer here too.

First, if you have single-end fastq files, you need to use the respective param files. Second, you need to have the right PATH in the targets file. For example, your targets files/table should point to the files:

 > targetspath <- "targets.txt"
> read.delim(targetspath, comment.char = "#")
#                      FileName SampleName Factor SampleLong Experiment        Date
# 1  ./data/SRR446027_1.fastq.gz        M1A     M1  Mock.1h.A          1 23-Mar-2012
# 2  ./data/SRR446028_1.fastq.gz        M1B     M1  Mock.1h.B          1 23-Mar-2012

You can double-chek if the files PATH is correct:

file.exists(targets$FileName)

then,

dir_path <- system.file("extdata/cwl", package="systemPipeR")
args <- loadWF(targets = targetspath, wf_file = "preprocessReads/trim-se.cwl", input_file = "preprocessReads/trim-se.yml", dir_path = dir_path)
args <- renderWF(args, inputvars = c(FileName = "_FASTQ_PATH1_", SampleName = "_SampleName_"))
cmdlist(args[1])
output(args[1])

In your targets example, you are missing some columns, especially the Factor. Also, replace FileName1 to FileName.

> targets <- read.delim("targetsPE_chip.txt", comment.char = "#")
> targets
#                     FileName1 SampleName    Factor SampleLong Experiment Date SampleReference
# 1 ~/Desktop/S1_R1_001.fastq.gz      SG-9 1 23-Dec-21       WT-1         NA   NA              NA
# 2  ~/Desktop/S2_R1_001.fastq.gz    SG-10 1 23-Dec-21       KI-2         NA   NA              NA

I hope this helps you.

All the best,

Daniela