Unhelpful Errors Running ChIPQC on Computing Cluster
3
0
Entering edit mode
@jaredandrews07-13809
Last seen 9 months ago
United States

I'm having issues running ChIPQC on a computing cluster, and the errors being thrown aren't particularly helpful.

The code:

library(ChIPQC)

my_experiment = ChIPQC(experiment="CellLine_Experiment_Sheet.csv", annotation="hg19")
ChIPQCreport(my_experiment, facetBy=c("Tissue", "Treatment"))
warnings()

My sample sheet:

SampleID Tissue Factor Condition Treatment Replicate bamReads ControlID bamControl Peaks PeakCaller
HHu HH H3AC None UNTREATED 1 ../BAMs/Batch11/HH.UNTREATED.H3AC.sorted.BL_removed.bam HHc ../BAMs/Batch11/HH.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3AC/HH.UNTREATED.H3AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HHr HH H3AC None ROMIDEPSIN 1 ../BAMs/Batch11/HH.ROMI.H3AC.sorted.BL_removed.bam HHc ../BAMs/Batch11/HH.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3AC/HH.ROMI.H3AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HUT78u HUT78 H3AC None UNTREATED 1 ../BAMs/Batch12/HUT78.UNTREATED.H3AC.sorted.BL_removed.bam HUT78c ../BAMs/Batch12/HUT78.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3AC/HUT78.UNTREATED.H3AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HUT78r HUT78 H3AC None ROMIDEPSIN 1 ../BAMs/Batch12/HUT78.ROMI.H3AC.sorted.BL_removed.bam HUT78c ../BAMs/Batch12/HUT78.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3AC/HUT78.ROMI.H3AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
OCILY7u OCILY7 H3AC None UNTREATED 1 ../BAMs/Batch14/OCILY7.UNTREATED.H3AC.sorted.BL_removed.bam OCILY7c ../BAMs/Batch14/OCILY7.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3AC/OCILY7.UNTREATED.H3AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HHu HH H3K27AC None UNTREATED 1 ../BAMs/Batch11/HH.UNTREATED.H3K27AC.sorted.BL_removed.bam HHc ../BAMs/Batch11/HH.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3K27AC/HH.UNTREATED.H3K27AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HHr HH H3K27AC None ROMIDEPSIN 1 ../BAMs/Batch11/HH.ROMI.H3K27AC.sorted.BL_removed.bam HHc ../BAMs/Batch11/HH.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3K27AC/HH.ROMI.H3K27AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HUT78u HUT78 H3K27AC None UNTREATED 1 ../BAMs/Batch12/HUT78.UNTREATED.H3K27AC.sorted.BL_removed.bam HUT78c ../BAMs/Batch12/HUT78.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3K27AC/HUT78.UNTREATED.H3K27AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
HUT78r HUT78 H3K27AC None ROMIDEPSIN 1 ../BAMs/Batch12/HUT78.ROMI.H3K27AC.sorted.BL_removed.bam HUT78c ../BAMs/Batch12/HUT78.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3K27AC/HUT78.ROMI.H3K27AC.sorted.BL_REMOVED.peaks.narrowPeak narrow
OCILY7u OCILY7 H3K27AC None UNTREATED 1 ../BAMs/Batch14/OCILY7.UNTREATED.H3K27AC.sorted.BL_removed.bam OCILY7c ../BAMs/Batch14/OCILY7.UNTREATED.INPUT.sorted.BL_removed.bam ../MACS2/BL_REMOVED/NARROW_PEAK/H3K27AC/OCILY7.UNTREATED.H3K27AC.sorted.BL_REMOVED.peaks.narrowPeak narrow

Stdout shows the shiftsize and signal over peaks, summits, etc all being calculated fine for each sample, so I'm pretty sure the files are all being found.

Stderr:

HH_unt_3 HH H3AC None UNTREATED 1 narrow
HH_romi_3 HH H3AC None ROMIDEPSIN 1 narrow
HUT78_unt_3 HUT78 H3AC None UNTREATED 1 narrow
HUT78_romi_3 HUT78 H3AC None ROMIDEPSIN 1 narrow
OCILY7_unt_3 OCILY7 H3AC None UNTREATED 1 narrow
HH_unt_27 HH H3K27AC None UNTREATED 1 narrow
HH_romi_27 HH H3K27AC None ROMIDEPSIN 1 narrow
HUT78_unt_27 HUT78 H3K27AC None UNTREATED 1 narrow
HUT78_romi_27 HUT78 H3K27AC None ROMIDEPSIN 1 narrow
OCILY7_unt_27 OCILY7 H3K27AC None UNTREATED 1 narrow
Checking chromosomes:
Compiling annotation...
Using default blacklist for hg19...
Computing metrics for 13 samples...
Error in sample HH_unt_3: argument to 'which' is not logical
Error in sample HH_romi_3: argument to 'which' is not logical
Error in sample HUT78_unt_3: argument to 'which' is not logical
Error in sample HUT78_romi_3: argument to 'which' is not logical
Error in sample OCILY7_unt_3: argument to 'which' is not logical
Error in sample HH_unt_27: argument to 'which' is not logical
Error in sample HH_romi_27: argument to 'which' is not logical
Error in sample HUT78_unt_27: argument to 'which' is not logical
Error in sample HUT78_romi_27: argument to 'which' is not logical
Error in sample OCILY7_unt_27: argument to 'which' is not logical
Error in ChIPQC(experiment = "CellLine_Experiment_Sheet.csv", annotation = "hg19") : 
  Errors in generating sample data.
In addition: Warning messages:
1: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
2: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
3: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
4: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
5: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
6: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
7: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
8: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
9: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
10: In if (class(sample) != "ChIPQCsample") { :
  the condition has length > 1 and only the first element will be used
Execution halted

I'm not sure why it's erroring on my samples, as usually that error corresponds to issues where categorical data (as character or factor type) is being compared to a numerical data.

Any help would be much appreciated, and I'll supply any additional info as needed.

sessionInfo:

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] ChIPQC_1.10.3              DiffBind_2.2.12
 [3] SummarizedExperiment_1.4.0 Biobase_2.30.0
 [5] GenomicRanges_1.26.2       GenomeInfoDb_1.10.3
 [7] IRanges_2.8.1              S4Vectors_0.12.1
 [9] BiocGenerics_0.20.0        ggplot2_2.0.0

loaded via a namespace (and not attached):
 [1] edgeR_3.12.0
 [2] splines_3.3.2
 [3] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
 [4] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [5] gtools_3.5.0
 [6] assertthat_0.1
 [7] latticeExtra_0.6-26
 [8] amap_0.8-14
 [9] RBGL_1.50.0
[10] Rsamtools_1.26.1
[11] Category_2.40.0
[12] RSQLite_1.0.0
[13] backports_1.1.0
[14] lattice_0.20-34
[15] limma_3.26.5
[16] digest_0.6.8
[17] RColorBrewer_1.1-2
[18] XVector_0.14.0
[19] checkmate_1.8.3
[20] colorspace_1.2-6
[21] Matrix_1.2-7.1
[22] plyr_1.8.3
[23] GSEABase_1.36.0
[24] chipseq_1.24.0
[25] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
[26] XML_3.98-1.3
[27] pheatmap_1.0.7
[28] ShortRead_1.32.1
[29] biomaRt_2.30.0
[30] genefilter_1.52.0
[31] zlibbioc_1.16.0
[32] xtable_1.8-0
[33] GO.db_3.4.0
[34] scales_0.3.0
[35] brew_1.0-6
[36] gdata_2.17.0
[37] TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2
[38] BiocParallel_1.4.3
[39] tibble_1.3.0
[40] annotate_1.48.0
[41] GenomicFeatures_1.26.2
[42] survival_2.39-5
[43] magrittr_1.5
[44] systemPipeR_1.8.1
[45] fail_1.3
[46] gplots_2.17.0
[47] hwriter_1.3.2
[48] GOstats_2.40.0
[49] graph_1.52.0
[50] tools_3.3.2
[51] BBmisc_1.11
[52] stringr_1.0.0
[53] sendmailR_1.2-1
[54] munsell_0.4.2
[55] locfit_1.5-9.1
[56] AnnotationDbi_1.36.2
[57] lambda.r_1.1.7
[58] Biostrings_2.42.1
[59] caTools_1.17.1
[60] futile.logger_1.4.1
[61] grid_3.3.2
[62] RCurl_1.95-4.8
[63] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2
[64] rjson_0.2.15
[65] AnnotationForge_1.16.1
[66] bitops_1.0-6
[67] base64enc_0.1-3
[68] gtable_0.1.2
[69] DBI_0.6-1
[70] reshape2_1.4.1
[71] R6_2.2.0
[72] GenomicAlignments_1.10.0
[73] Nozzle.R1_1.1-1
[74] dplyr_0.5.0
[75] rtracklayer_1.34.1
[76] futile.options_1.0.0
[77] KernSmooth_2.23-15
[78] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[79] stringi_0.5-2
[80] BatchJobs_1.6
[81] Rcpp_0.12.10
[82] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
[83] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2
chipqc chipqcreport • 1.8k views
ADD COMMENT
0
Entering edit mode
@thomas-carroll-7019
Last seen 24 months ago
United States/New York/The Rockefeller …

hi,

I will try and reproduce with some local data and get back you if I need anymore help or data in debugging.

speak soon,

tom

ADD COMMENT
0
Entering edit mode

Excellent, thank you. `ChIPQCsample` works just fine on my individual samples, so I don't think there's anything wrong with any of my bam or peak files. I may just go ahead and create the sample objects for each of my samples and try running it with a list of those, inconvenient as it is. Also, it takes ~60-70 computational hours for the analysis to error out - how long does a typical analysis with ~10 samples usually take? Thanks again.

ADD REPLY
0
Entering edit mode
@thomas-carroll-7019
Last seen 24 months ago
United States/New York/The Rockefeller …

hi,

I am unable to recreate this error in latest version of ChIPQC (ChIPQC_1.13.2 ). I have used some encode data to make an example with public data.

Here is the sample sheet setup.

  SampleID Tissue Factor Condition Treatment Replicate    bamReads    ControlID                  bamControl
1    Myc_1   Ch12    Myc      None      None         1  ENCFF726ILJ.bam Ch12_Control        ENCFF324OBT.bam
2    Myc_2   Ch12    Myc      None      None         2  ENCFF644VXF.bam Ch12_Control      ENCFF324OBT.bam
3    Myc_3    Mel    Myc      None      None         1  ENCFF176BCF.bam  Mel_Control         ENCFF190WZK.bam
                           Peaks PeakCaller
1   ENCFF962BGJ.bed.gz     narrow
2   ENCFF160KXR.bed.gz     narrow
3   ENCFF363WUG.bed.gz     narrow
 

And then running ChIPQC works fine.

> myRun <- ChIPQC("~/Projects/Software/Tests/ChIPQC_Test.csv",annotation = "mm9")

If you can send me the data from first row of your sample sheet I can see if I can recreate from there.

##

You are right that you can use ChIPQCsample() to produce the results for individual contrasts and then use a named list to plot the data.

> Ch12_1 <- ChIPQCsample("/Users/tcarroll/Downloads/ENCFF644VXF.bam",chromosomes = "chr1",peaks = "~/Downloads/ENCFF160KXR.bed.gz")

>Ch12_2 <- ChIPQCsample("~/Downloads/ENCFF726ILJ.bam",chromosomes = "chr1",peaks = "~/Downloads/ENCFF962BGJ.bed.gz")

> myChIPQClist <- list(Ch12_Rep1=Ch12_1,ch12_Rep2=Ch12_2)

> flagtagcounts(myChIPQClist)
                  Ch12_Rep1 ch12_Rep2
UnMapped                  0         0
Mapped              1376881   1258838
Duplicates                0         0
MapQPass            1186244   1084038
MapQPassAndDup            0         0
DuplicateByChIPQC    109804    113178
> QCmetrics(myChIPQClist)
            Reads Map% Filt% Dup% ReadL FragL RelCC   SSD RiP%
Ch12_Rep1 1376881  100  13.8    0    36   161  1.83 0.843 17.8
ch12_Rep2 1258838  100  13.9    0    36   144  2.24 1.060 21.4

## 

The ChIPQC() function uses BiocParallel to parallelize QC over samples, so by setting the appropriate BiocParallel parameter you should be able to process datasets fairly quickly.

Some examples of setting BiocParallel parameters would be..

- ChIPQC of samples in serial.

BiocParallel::register(BiocParallel::SerialParam())
myRun <- ChIPQC("~/Projects/Software/Tests/ChIPQC_Test.csv",annotation = "hg19")

 

- ChIPQC of samples using all available cores .

BiocParallel::register(BiocParallel::MulticoreParam())
myRun <- ChIPQC("~/Projects/Software/Tests/ChIPQC_Test.csv",annotation = "hg19")

- ChIPQC of samples using job scheduler (See BatchJobs package for more details on setting this up for your own cluster).


library(BiocParallel)
library(BatchJobs)

loadConfig()

## register SLURM cluster instructions from a local template file

funs <- makeClusterFunctionsSLURM("newSimple.tmpl")

 

param <- BatchJobsParam(resources=list(ncpus=1,ntasks=1,walltime=2000),
                        cluster.functions=funs)
register(param)


myRun <- ChIPQC("~/Projects/Software/Tests/ChIPQC_Test.csv",annotation = "hg19")

best,

tom

ADD COMMENT
0
Entering edit mode
@jaredandrews07-13809
Last seen 9 months ago
United States

Yeah, I'm not sure what the issue was here. I managed to get it running locally, so I'm just going to chalk it up to issues with R on the cluster. It does not like running in parallel though, I always have to run it in serial to get it running.

Thanks for checking up on it.

ADD COMMENT

Login before adding your answer.

Traffic: 665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6