Question

Help with Error in DiffBind

0

Entering edit mode

Chris ▴ 20

@3fdb6f97

Last seen 2 days ago

United States

Hello all,

Seem there is something wrong with my samples

samples <- read.csv("samplesheet_DiffBind.csv")  
result <- dba.analyze(samples)

Error in peaks[, pCol]/max(peaks[, pCol]) : non-numeric argument to binary operator

I understand what this error means but don't know what is the cause. Would you suggest to me how to fix this? Thank you so much!

DiffBind ATACSeq • 4.7k views

ADD COMMENT • link 2.1 years ago • updated 2.0 years ago Chris ▴ 20

1

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 3 months ago

Cambridge, UK

This error is usually caused by incorrectly specifying the PeakCaller (or ScoreCol if present).

What is the format of your peak files? Which column contains the (numerical) score? What are your specifying for the PeakCaller of your samplesheet?

ADD COMMENT • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks Rory for your reply! My peak files is .broadPeak, created from nf-core/ATAC. The PeakCallers column is broad. I have SampleID, condition, replicate, bamReads, and Peaks column but no column contains the score, I think.

I changed PeakCallers column to bed and don't have that error.

I have this red messages and error that need your advice:

[W::hts_idx_load3] The index file is older than the data file [E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes

Count error: Error: Error processing one or more read files. Check warnings().

Unable to count overlapping reads. Warning messages: 1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) : scheduled cores 4, 2 encountered errors in user code, all values of the jobs will be affected 2: error in evaluating the argument 'x' in selecting a method for function 'assay': wrong args for environment subassignment 3: error in evaluating the argument 'x' in selecting a method for function 'assay': wrong args for environment subassignment

ADD REPLY • link 2.1 years ago Chris ▴ 20

1

Entering edit mode

If you have an index file (.bai) file that is older than its corresponding .bam file, it need to be re-indexed.

One way to deal with this is to delete the .bai file. So long as you have write access to the directory, DiffBind will automatically re-index the file and create a new .bai.

So see which file is the problem, you can run dba.count() with bParallel=FALSE.

ADD REPLY • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you for your reply!

dba.count(bParallel=FALSE)

Error: DBA object missing!

So I try to use this:

dba(DBA,mask, minOverlap=2,
    sampleSheet="dba_samples.csv", 
    config=data.frame(AnalysisMethod=DBA_DESEQ2,th=0.05,
                      DataType=DBA_DATA_GRANGES, RunParallel=TRUE, 
                      minQCth=15, fragmentSize=125, 
                      bCorPlot=FALSE, reportInit="DBA", 
                      bUsePval=FALSE, design=TRUE,
                      doBlacklist=TRUE, doGreylist=TRUE),
    peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter, 
    filter, skipLines=0, 
    bAddCallerConsensus=FALSE, 
    bRemoveM=TRUE, bRemoveRandom=TRUE, 
    bSummarizedExperiment=FALSE,
    attributes, dir)

But I got some errors:

Error in is(DBA, "character") : object 'DBA' not found
Error in is(DBA, "character") : object 'mask' not found
Error in is(DBA, "character") : object 'peakFormat' not found

atac.counts <- dba.count(atac.peaks)

Error: Error processing one or more read files. Check warnings().
In addition: Warning messages:
1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) :
scheduled cores 1, 4 encountered errors in user code, all values of the jobs will be affected
2: error in evaluating the argument 'x' in selecting a method for function 'assay': wrong args for environment subassignment
3: error in evaluating the argument 'x' in selecting a method for function 'assay': wrong args for environment subassignment

Would you suggest how to fix the errors? Thank you so much!

ADD REPLY • link 2.1 years ago Chris ▴ 20

1

Entering edit mode

Sorry, I didn't see that you were using dba.analyze() to do a full default analysis.

You can call

result <- dba.analyze(samples, bParallel=FALSE)

to better see the errors.

Alternatively, you can break out the analysis into steps to you can narrow in on the step that is going wrong:

result <- dba(sampleSheet=samples)
result <- dba.blacklist(result)
result <- dba.count(result, bParallel=FALSE)
result <- dba.normalize(result)
result <- dba.contrast(result)
result <- dba.analyze(result)

ADD REPLY • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you for your help!

This is what I got when run:

result <- dba.analyze(samples, bParallel=FALSE)

[W::hts_idx_load3] The index file is older than the data file: /labs/PI/atacseq_output_03172023/bwa/merged_library/DISEASED_REP2.mLb.clN.sorted.bam.bai
Normalize DESeq2 with defaults...
Forming default model design and contrast(s)...
Analyzing...
Adding contrasts(s)...
Analyze error: Error in pv.DBA(DBA, method, bTagwise = bTagwise, minMembers = 3, bParallel = bParallel): Unable to perform analysis: no contrasts specified.

Unable to complete analysis.
Warning messages:
1: No contrasts added. There must be at least two sample groups with at least three replicates. 
2: No contrasts added. There must be at least two sample groups with at least three replicates.

Would you suggest how to fix this error? I have only 2 replicates for the control and 2 replicates for the diseased group.

ADD REPLY • link 2.1 years ago Chris ▴ 20

1

Entering edit mode

Looks like you should either delete the file

/labs/PI/atacseq_output_03172023/bwa/merged_library/DISEASED_REP2.mLb.clN.sorted.bam.bai

or reindex

/labs/PI/atacseq_output_03172023/bwa/merged_library/DISEASED_REP2.mLb.clN.sorted.bam

.

ADD REPLY • link 2.1 years ago Rory Stark ★ 5.2k

score 2 · Accepted Answer · 2023-03-30

2

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 3 months ago

Cambridge, UK

In your case, with only two replicates in each sample group, you can not perform a default analysis using only dba.analyze(). I suggest that you re-write your script to perform each step in turn so that you can customize what is happening. Specifically, you'll need to specify minMembers=2when calling dba.contrast():

result <- dba(sampleSheet=samples)
result <- dba.blacklist(result)
result <- dba.count(result)
result <- dba.normalize(result)
result <- dba.contrast(result, minMembers=2)
result <- dba.analyze(result)

ADD COMMENT • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks Rory!

result <- dba(samples)
Error in if ((length(callers) == 1) & (callers[1] == "counts")) { : 
  argument is of length zero

Would you suggest how to fix this?

ADD REPLY • link 2.1 years ago Chris ▴ 20

1

Entering edit mode

Apologies. The first line should be

result <- dba(sampleSheet=samples)

I've gone back and edited previous replies to reflect this.

ADD REPLY • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you for your help! Would you tell me how can get a plot like this enter image description here

ADD REPLY • link 2.1 years ago Chris ▴ 20

1

Entering edit mode

There's a function in DiffBind called dba.plotProfile() that generates these plots. However the annotations on the right hand side, showing overlaps with genomic features and distances to the nearest TSS, are not available in the current functionality.

There's a tutorial notebook available showing how to get these plots, which you can see here: https://content.cruk.cam.ac.uk/bioinformatics/software/DiffBind/plotProfileDemo.html

ADD REPLY • link 2.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks Rory! Would you please tell me how to get annotate genes from results of DiffBind? I looked at the GenomicAlignments package but don't know how.

ADD REPLY • link 2.1 years ago • updated 2.0 years ago Chris ▴ 20