Csaw : filter windows based on matched negative control
2
0
Entering edit mode
@pierre-francois-roux-7997
Last seen 6.6 years ago
France

 

Dear BioC users,

I am currently using csaw to analyse my ChIP-seq data. For a given histone modification, I have 3 conditions, each with two biological replicates, and a matched input control library. Before going through the differential binding analysis, I would like to require a 3-fold or greater increase in abundance over the matched control to retain a window.  According to the vignette, it is pretty clear how to process when you deal with a common input for all ChIPped libraries : 

Let's say that the vector bam.files contains the names of 4 .bam files related to 4 ChIPped libraries, and that the name of the .bam related to the input control for all of these libraries is input.bam.

in.demo <- windowCounts(c(bam.files, "input.bam"), ext=frag.len, param=param)
chip <- in.demo[,1:4]
control <- in.demo[,5]

in.binned <- windowCounts(c(bam.files, "input.bam"), bin=TRUE, width=10000, param=param)
chip.binned <- in.binned[,1:4]
control.binned <- in.binned[,5]

filter.stat <- filterWindows(chip, control, type="control", prior.count=5)

keep <- filter.stat$filter > log2(3)

My problem is that there is no specific advice to deal with as many inputs as ChIPped libraries.

Is someone has had a similar question and could propose a solution to this issue ?

So far I have tried the following :

in.demo <- windowCounts(c(chipped.files, input.files), ext=frag.len, param=param)
chip <- in.demo[,1:4]
control <- in.demo[,5:8]

in.binned <- windowCounts(c(chipped.files, input.files), bin=TRUE, width=10000, param=param)
chip.binned <- in.binned[,1:4]
control.binned <- in.binned[,5:8]

filter.stat <- filterWindows(chip, control, type="control", prior.count=5)

Which returns :

Error in getWidths(background) : 
  need to specify read lengths in 'data$rlen'​

 

Thanks a lot for your advice !

Cheers,

Pierre-François

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] csaw_1.2.1           Rsamtools_1.20.5     SeqGL_1.1.2          WGCNA_1.47          
 [5] RSQLite_1.0.0        DBI_0.3.1            fastcluster_1.1.16   dynamicTreeCut_1.62 
 [9] spams_2.5            lattice_0.20-33      ChIPKernels_1.1      Matrix_1.2-2        
[13] kernlab_0.9-22       sfsmisc_1.0-28       gtools_3.5.0         BSgenome_1.36.3     
[17] rtracklayer_1.28.10  GenomicRanges_1.20.8 GenomeInfoDb_1.4.3   Biostrings_2.36.4   
[21] XVector_0.8.0        IRanges_2.2.9        S4Vectors_0.6.6      BiocGenerics_0.14.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1             GO.db_3.1.2             digest_0.6.8            foreach_1.4.3          
 [5] plyr_1.8.3              futile.options_1.0.0    acepack_1.3-3.3         ggplot2_1.0.1          
 [9] zlibbioc_1.14.0         GenomicFeatures_1.20.6  rpart_4.1-10            preprocessCore_1.30.0  
[13] proto_0.3-10            splines_3.2.1           BiocParallel_1.2.22     stringr_1.0.0          
[17] foreign_0.8-66          biomaRt_2.24.1          RCurl_1.95-4.7          munsell_0.4.2          
[21] nnet_7.3-11             gridExtra_2.0.0         edgeR_3.10.5            Hmisc_3.17-0           
[25] codetools_0.2-14        matrixStats_0.14.2      XML_3.98-1.3            reshape_0.8.5          
[29] GenomicAlignments_1.4.2 MASS_7.3-44             bitops_1.0-6            grid_3.2.1             
[33] gtable_0.1.2            magrittr_1.5            scales_0.3.0            stringi_0.5-5          
[37] impute_1.42.0           reshape2_1.4.1          doParallel_1.0.10       limma_3.24.15          
[41] latticeExtra_0.6-26     futile.logger_1.4.1     Formula_1.2-1           lambda.r_1.1.7         
[45] RColorBrewer_1.1-2      iterators_1.0.8         tools_3.2.1             Biobase_2.28.0         
[49] survival_2.38-3         AnnotationDbi_1.30.1    colorspace_1.2-6        cluster_2.0.3      

 

 

 

 

 

csaw differential binding analysis • 1.6k views
ADD COMMENT
4
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 9 hours ago
The city by the bay

csaw does not provide support for condition-specific filtering of windows. This is largely to preserve statistical rigour; you could imagine that condition-specific filtering would induce spurious differences if a window were retained in one condition while being filtered out in another condition. As such, you cannot (at least, not with the supplied functions) ask csaw to retain windows where the signal in the ChIP samples for one condition exceeds the signal in the matched control for that condition.

In general, any independent filtering must be done in a manner that is blind to the design and to the DB status of each window. The suggested approach in the user's guide is fine because it looks at the average of all ChIP samples, and compares that to the common input; the former does not contain any information about DB, while the latter is not used in the final DB analysis itself. If you want to do something similar, you should compare the average of all ChIP samples to the average of all input samples.

In fact, this is what you have tried to do in the second call to filterWindows, but the error message suggests that control$ext is NA. This shouldn't be possible if you've subsetted the RangedSummarizedExperiment object correctly. Do you really have 8 libraries? Your description of the experiment suggests you have 9 samples, 3 of which are inputs.

Edit: While we're on this topic, another popular method of handling condition-specific controls is count subtraction from the ChIP samples. This is not supported by csaw, for various reasons: see A: DESeq2 for ChIP-seq differential peaks and comments for a discussion.

ADD COMMENT
0
Entering edit mode
@pierre-francois-roux-7997
Last seen 6.6 years ago
France

Hi Aaron,

Thanks a lot for your answer. 

In actual fact, I have 12 libraries, 6 that are samples and 6 that are inputs (2 samples and 2 inputs per condition).

It wasn't that clear to me that using filterWindows(chip, control)will compare the average of all ChIP to the average of all input sample. Thanks for the clarification. It is actually what I was looking for. I will give a try using this setting and go through the whole analysis.

Regarding the error I get, it was indeed a problem in the subsetting of the RangedSummarizedExperiment.

Thanks a lot !

Pierre-François

ADD COMMENT

Login before adding your answer.

Traffic: 477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6