Question

MEDIPS: Number of windows tested in Paired-end data set

0

Entering edit mode

peter.mcerlean • 0

@petermcerlean-9754

Last seen 8.6 years ago

Hi Lukas,

I’ve been dealing with some paired-end ChIP-seq data from an inducible-transcription factor experiment.

For each of the samples (x3 treatment & controls) and inputs (x1 each treatment & control) I’ve done:

MEDIPS.createSet (file = Sample, BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, window_size = 1000, paired =TRUE, bwa=TRUE, chr.select = c("chr1"…."chrY"))

When the samples load, I see that ends which are far apart are bing joined e.g.:

Sample 1

Mean insertion size: 11078.77 nt

SD of the insertion size: 1016181 nt

Max insertion size: 235650514 nt

Min insertion size: 0 nt

I go ahead and run the analysis:

MEDIPS.meth (MSet1 = Treatment, MSet2 = Control, ISet1 = Treatment.Input, ISet2 = Control.Input, p.adj = "bonferroni", diff.method = "edgeR", minRowSum=100, MeDIP=F, quantile=TRUE)

I find that almost all the windows are being test for, even if the minRowSum is increased to some crazy level:

minRowsum=100

Total windows: 3,095,689

Windows Tested: 3,090,305

P<0.05: 247,278

minRowsum=1,000

Total windows: 3,095,689

Windows Tested: 1,648,789

P<0.05: 153,028

I made extend=0 and removed duplicates but still get this:

minRowsum=100

Total windows: 3,095,689

Windows Tested: 3,016,374

P<0.05: 243,846

I’ve visualized the BAM files via deeptools and found that there are probably <5,000 discrete regions of enrichment in our treatment samples, consistent with our transcription factor being induced.

I think the mating of pairs over such a large distance in causing this issue but any ideas on how to resolve it?

Kind Regards

Peter

medips r • 2.4k views

ADD COMMENT • link 9.0 years ago peter.mcerlean • 0

score 0 · Answer 1 · 2016-04-21

0

Entering edit mode

peter.mcerlean • 0

@petermcerlean-9754

Last seen 8.6 years ago

Hi Lukas,

I've just been running the analysis without the paired end parameters and have gotten better results:

Unique	MinRowSum	Paired	BWA	Total Windows	Windows Tested	P<0.05	Adj P<0.05
1	50	FALSE	FALSE	3,095,689	231,444	41,099	64
1	100	FALSE	FALSE	3,095,689	65,927	18,429	183
1	200	FALSE	FALSE	3,095,689	15,735	6,639	322

I've inspected some of the significant windows visually and the regions that gain enrichment look good. However, those that lose enrichment just seem to be background, which I'm guessing is due to the nature of the induction experiment.

Can I trust these results over the paired-end approach?

Thanks!

Peter.

ADD COMMENT • link 9.0 years ago peter.mcerlean • 0

0

Entering edit mode

Hi Peter, please find below my response to your first email below (your second email arrived while I was writing). For paired-end reads, the insert size is known. Therefore, MEDIPS does not use the extend parameter but considers the entire fragment between mates for calculating coverage. If you process paired end data in single-end mode, information abt the insert size will be lost and reads will be extended by the extend parameter instead. Moreover, both mates of a Pair will be imported and considered for calculating coverage, therefore the counts are twice the actually sequenced DNA fragments. In my opinion, none of this has a severe negative effect on your results. However, processing paired-end data in paired-end mode is more appropriate. It still puzzle’s me what is going wrong in your case. Bam files are imported by Rsamtools and relevant parameters are: scanFlag = scanBamFlag(isPaired = T, isProperPair = TRUE, hasUnmappedMate = FALSE, isUnmappedQuery = F, isFirstMateRead = T, isSecondMateRead = F) Only the first mate of properly paired mates are supposed to be imported. As the first mate also has the information abt the insert size, MEDIPS will extend the first mate by the insert size prior to calculating coverage at genome wide windows. I am wondering, if e.g. your bwa paired-end bam files have ‘proper Paired’ reads included where both mates span long distances and have long insert sizes? All the best, Lukas Hi Peter, thank you for reporting this observation. Filtering of lowly covered genomic regions happens in the MEDIPS.diffMeth function which is called within the MEDIPS.meth function. Its really straight forward and I currently do not see what can go wrong here. Its roughly: (...) filter = rowSums(values) >= minRowSum (...) cat(paste("Execute edgeR for count data of", sum(filter), "windows...\n", sep = " “)) (...) d <- edgeR::DGEList(counts = values[filter, ], group = edRObj.group) (…) where values is a matrix with rows=genomic windows and columns=all samples at MSet1 and MSet2 (Input is not considered). Moreover, I do see an effect of the minRossum parameter in your example, i.e an increased value results in a decreased number of windows to be tested for differential enrichment. The major effect of reduced number of windows will be on the adjusted p-value. However, I also think its surprising that there remain so many windows which have a coverage > 1000 as a sum over all samples. Can you check if this is true by for example examining the count table for such windows? As you have only 6 ChIP samples, there must be ~166 reads per sample in average for windows surviving a minRowsum parameter of 1000. Do you have such a deep coverage? All the best, Lukas On 21 Apr 2016, at 18:53, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Answer: MEDIPS: Number of windows tested in Paired-end data set<https: support.bioconductor.org="" p="" 81098="" #81117="">: Hi Lukas, I've just been running the analysis without the paired end parameters and have gotten better results: Unique MinRowSum Paired BWA Total Windows Windows Tested P<0.05 Adj P<0.05 1 50 FALSE FALSE 3,095,689 231,444 41,099 64 1 100 FALSE FALSE 3,095,689 65,927 18,429 183 1 200 FALSE FALSE 3,095,689 15,735 6,639 322 I've inspected some of the significant windows visually and are in agreement (i.e. Gain or loss in treatment Vs control). Can I trust these results over the paired-end approach? Thanks! P. ________________________________ Post tags: medips, r You may reply via email or visit A: MEDIPS: Number of windows tested in Paired-end data set

ADD REPLY • link 9.0 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Hi Lukas,

I thought the same about there being some ‘proper pairs’ with large inserts so ran Picard-CollectInsertSizeMetrics and sure enough get something like this:

MEDIAN_INSERT_SIZE 323

MEDIAN_ABSOLUTE_DEVIATION 75

MIN_INSERT_SIZE 30

MAX_INSERT_SIZE 243946804

MEAN_INSERT_SIZE 332.5694

STANDARD_DEVIATION 115.047712

READ_PAIRS 8731854

I filter my BAMs as ‘proper-pair, inserts <=500’ but still loads into MEDIPS with huge insert present:

Mean insertion size: 4082.162 nt

SD of the insertion size: 581612.2 nt

Max insertion size: 213644984 nt

It would seem that the filter may not be doing the job.

The thing is that I can run MACS2 with the BWA aligned BAMs it builds a model no problem with insert size as 300bp?!?!

I decided to realign with Bowtie2 (max insert 500bp) and run analysis again (unique=1, paired=T, extend=120) and everything worked fine:

MinRowSum	Total Windows	Windows Tested	P<0.05
50	3,095,689	92,626	16,601
100	3,095,689	27,525	8,961
300	3,095,689	1,891	695

Not sure what the issue was with the BWA aligned files. I can send them to you if you’d like?

Thanks again!

Peter

ADD REPLY • link 9.0 years ago peter.mcerlean • 0

0

Entering edit mode

Dear Peter, thank you for further investigating this issue. Insert sizes appear to be much more reasonable in the bowtie2 case. I have previously observed the 'negative widths are not allowed’ issue for bwa only and not for bowtie. Can you process the bowtie2 alignments by setting paired=T and bwa=T? I urgently need to further investigate the behaviour of RSamtools for the different aligners. I’ll appreciate any comments and hints abt this topic! All the best, Lukas On 27 Apr 2016, at 15:22, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Comment: MEDIPS: Number of windows tested in Paired-end data set<https: support.bioconductor.org="" p="" 81098="" #81985="">: Hi Lukas, I thought the same about there being some ‘proper pairs’ with large inserts so ran Picard-CollectInsertSizeMetrics and sure enough get something like this: MEDIAN_INSERT_SIZE 323 MEDIAN_ABSOLUTE_DEVIATION 75 MIN_INSERT_SIZE 30 MAX_INSERT_SIZE 243946804 MEAN_INSERT_SIZE 332.5694 STANDARD_DEVIATION 115.047712 READ_PAIRS 8731854 I filter my BAMs as ‘proper-pair, inserts <=500’ but still loads into MEDIPS with huge insert present: Mean insertion size: 4082.162 nt SD of the insertion size: 581612.2 nt Max insertion size: 213644984 nt It would seem that the filter may not be doing the job. The thing is that I can run MACS2 with the BWA aligned BAMs it builds a model no problem with insert size as 300bp?!?! I decided to realign with Bowtie2 (max insert 500bp) and run analysis again (exclude ‘bwa’ option, paired=T) but get this: Mean insertion size: 312.011 nt SD of the insertion size: 96.95693 nt Max insertion size: 500 nt Min insertion size: 30 nt Creating GRange Object... Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 216: negative widths are not allowed Any further ideas? I can send you some of the BAM files if you’d like? Leaning towards analyzing the bwa data as single read and bumping up the minRow Sum to account for the double counts as per your suggestion. Thank again! Peter ________________________________ Post tags: medips, r You may reply via email or visit C: MEDIPS: Number of windows tested in Paired-end data set

ADD REPLY • link 9.0 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Hi Lukas

I edited my comment above apparently just before your response (always seem to do this!) but when I put the extend=120 I get success with the Bowtie BAMs.

I'll run them again with extend=0 and bwa/paired options and let you know how it goes.

Best,

Peter

ADD REPLY • link 9.0 years ago peter.mcerlean • 0

0

Entering edit mode

Hi Lukas,

Everything ran smoothly with the Bowtie BAMs using these various parameters:

Unique	MinRowSum	Paired	BWA	EXTEND	Total Windows	Windows Tested	P<0.05
1	50	TRUE	TRUE	0	3,095,689	84,713	14,949
1	100	TRUE	TRUE	0	3,095,689	24,796	7,960
1	300	TRUE	TRUE	0	3,095,689	1,715	578

Let me know if you'd like to trouble shoot anything else.

Best, P.

ADD REPLY • link 9.0 years ago peter.mcerlean • 0