Entering edit mode
Dear colleagues,
I'm using DiffBind for the first time and I'm encountering some issues, one in particular is quite weird.
I've got a simple experiment, 3 conditions, 4 replicates each. I've obtained peaks using MACS2 (-- broad function).
SampleID Condition Replicate
1 S1 control 1
2 S2 control 2
3 S3 control 3
4 S4 control 4
5 S5 flu_low 1
6 S6 flu_low 2
7 S7 flu_low 3
8 S8 flu_low 4
9 S9 flu_high 1
10 S10 flu_high 2
11 S11 flu_high 3
12 S12 flu_high 4
After doing the dba.count and looking at the dba object I realized that the number of reads per sample is extraordinarily high (at least two times the actual library size, which was around 60M SE). How is it possible?
Is this linked to the fact that I retained multi-mappers reads?
Any advice would be greatly appreciated.
Best
Marianna
It's unclear what you're asking. Please always provide some minimal code, data examples or plots to illustrate the problem. Generally though, from what I know and have seen over the last years, multimappers are usually not included in the analysis. If this is the issue here is unclear. Are you referring to raw counts or normalized counts?
Thank you ATpoint for your reply.
I'll try to describe the issue in more detail.
Below is the code used for DiffBind
Now the weird thing is that it seems there are, for instance in sample S1, more than 130M reads. This is not possible as for this sample there mere 60M raw reads! Do you think that running bowtie2 with -k argument can be a problem? As setting -k 6 the sam file will retain reads mapping to a maximum of 6 different locations.
Thank you
Best
Marianna
I see no code, and you did not answer whether it's about raw or normalized counts.
Sorry, I accidentally replied before completing the answer. I'm referring to raw counts
Yes, this could potentially be due to multimappers, since (iirc)
-k 6
outputs up to 6 locations per multimapper. My recommendation is to remove any multimappers, for example by a MAPQ filter, e.g.samtools view -q 20
. Note, this is my recommendation, I am not the DiffBind maintainer. Multimappers are usually never considered due to their uncertainty.Thank you ATpoint!
I tried by filtering with
samtools view -q 42
, to keep only uniquely mapping reads but the results were very similar to the one reported above. I'm afraid of the -k in bowtie2, so I'm trying to run it without -k, to completely get rid of the multimappers.Thank you
Marianna
I am not exactly sure how the k parameter works, I never use it, but I agree, realignment without it and filter for MAPQ > 0 makes sense. Maybe not 42, that is strict, maybe use something like 20.
DiffBind
should already be filtering out multi-mapped reads using themapQCth
parameter indba.count()
. This is set tomapQCth=15
by default.