Hi Lukas,
I’ve been dealing with some paired-end ChIP-seq data from an inducible-transcription factor experiment.
For each of the samples (x3 treatment & controls) and inputs (x1 each treatment & control) I’ve done:
MEDIPS.createSet (file = Sample, BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, window_size = 1000, paired =TRUE, bwa=TRUE, chr.select = c("chr1"…."chrY"))
When the samples load, I see that ends which are far apart are bing joined e.g.:
Sample 1
Mean insertion size: 11078.77 nt
SD of the insertion size: 1016181 nt
Max insertion size: 235650514 nt
Min insertion size: 0 nt
I go ahead and run the analysis:
MEDIPS.meth (MSet1 = Treatment, MSet2 = Control, ISet1 = Treatment.Input, ISet2 = Control.Input, p.adj = "bonferroni", diff.method = "edgeR", minRowSum=100, MeDIP=F, quantile=TRUE)
I find that almost all the windows are being test for, even if the minRowSum is increased to some crazy level:
minRowsum=100
Total windows: 3,095,689
Windows Tested: 3,090,305
P<0.05: 247,278
minRowsum=1,000
Total windows: 3,095,689
Windows Tested: 1,648,789
P<0.05: 153,028
I made extend=0 and removed duplicates but still get this:
minRowsum=100
Total windows: 3,095,689
Windows Tested: 3,016,374
P<0.05: 243,846
I’ve visualized the BAM files via deeptools and found that there are probably <5,000 discrete regions of enrichment in our treatment samples, consistent with our transcription factor being induced.
I think the mating of pairs over such a large distance in causing this issue but any ideas on how to resolve it?
Kind Regards
Peter
Hi Lukas,
I thought the same about there being some ‘proper pairs’ with large inserts so ran Picard-CollectInsertSizeMetrics and sure enough get something like this:
MEDIAN_INSERT_SIZE 323
MEDIAN_ABSOLUTE_DEVIATION 75
MIN_INSERT_SIZE 30
MAX_INSERT_SIZE 243946804
MEAN_INSERT_SIZE 332.5694
STANDARD_DEVIATION 115.047712
READ_PAIRS 8731854
I filter my BAMs as ‘proper-pair, inserts <=500’ but still loads into MEDIPS with huge insert present:
Mean insertion size: 4082.162 nt
SD of the insertion size: 581612.2 nt
Max insertion size: 213644984 nt
It would seem that the filter may not be doing the job.
The thing is that I can run MACS2 with the BWA aligned BAMs it builds a model no problem with insert size as 300bp?!?!
I decided to realign with Bowtie2 (max insert 500bp) and run analysis again (unique=1, paired=T, extend=120) and everything worked fine:
MinRowSum
Total Windows
Windows Tested
P<0.05
50
3,095,689
92,626
16,601
100
3,095,689
27,525
8,961
300
3,095,689
1,891
695
Not sure what the issue was with the BWA aligned files. I can send them to you if you’d like?
Thanks again!
Peter
Hi Lukas
I edited my comment above apparently just before your response (always seem to do this!) but when I put the extend=120 I get success with the Bowtie BAMs.
I'll run them again with extend=0 and bwa/paired options and let you know how it goes.
Best,
Peter
Hi Lukas,
Everything ran smoothly with the Bowtie BAMs using these various parameters:
Let me know if you'd like to trouble shoot anything else.
Best, P.