Question

questions related to DiffBind package

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 3 months ago

Cambridge, UK

Hi Marta- Well spotted! I have been able to reproduce the behavior you describe, and found the cause. edgeR tends to report normalized counts-per-million (cpm), which is good for comparing counts between experiments but uses values that are potentially on a much different scale than for the original experiment. Instead of scaling every experiment to one million reads, DiffBind scales using a factor based on the library sizes of the samples in the experiment. There was a discrepancy in how this scaling factor is computed in dba.count for the global binding matrix, and how it is computed for a specific contrast in dba.report. I will soon check in a fix to make these both use the same scaling factor (specifically, the mean library size, as has been used for the global scores). In the next release, I'll consider adding in option to report the cpm values as a global read score and/or for the count values in dba.report. Note that it is still be possible to get different normalized read values for the same samples globally and within a specific contrast report if the contrast does not include all the samples in the DBA object, as the data are re-normalized separately for each contrast using only the applicable samples. And yes, if duplicate reads are removed in dba.count, they will not be used in any subsequent analysis based on those counts. Cheers- Rory ---------------------------------------------------------------------- ------ Dr. Rory Stark Principal Bioinformatics Analyst Cancer Research UK Cambridge Institute University of Cambridge Robinson Way Cambridge CB2 0RE United Kingdom +44 (0)1223 769 658 rory.stark@cruk.cam.ac.uk ---------------------------------------------------------------------- ------ From: Marta Byrska-Bishop <mbb5158@psu.edu<mailto:mbb5158@psu.edu>> Date: Tue, 11 Feb 2014 13:24:37 -0500 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Subject: questions related to DiffBind package Hello, I'm a graduate student in Ross Hardison's lab at Penn State University. I've been using your DiffBind package in R for differential binding analysis and I have a couple of questions for you. I'm running this analysis to compare the genome-wide binding patterns of a wild type and a mutated form of certain transcription factor. We have 2 replicates available for both of the TFs. I ran the differential binding analysis using only the consensus peaks. My question is why the read counts from individual samples that I get from dba.report (bCounts = TRUE) are not identical with the ones I get from saving a whole binding matrix after performing read counting using dba.count. I understand that irrespective of the normalization method chosen for dba.count, dba.analyze uses raw read counts and performs normalization independently from dba.count. If for differential binding analysis using dba.analyze I use the following options: method = DBA_EDGER, bFullLibrarySize = FALSE, & bSubControl = TRUE, are the read counts going to be normalized the same way as in dba.count when using DBA_SCORE_TMM_MINUS_EFFECTIVE? I compared the read counts I get from both dba.count and dba.report using the above settings and they are very close, but not identical. Shouldn't they be exactly the same? Also, just to confirm, if in dba.count I choose an option bRemoveDuplicates = TRUE, are the duplicate reads going to be also filtered out in differential binding analysis using dba.analyze? I'd greatly appreciate any information in regards to my questions. Thank you very much for your time! Marta Marta Byrska-Bishop PhD candidate Hardison Lab || The Pennsylvania State University || Wartik 303 || University Park PA 16802 || lab: 814-863-3150 [[alternative HTML version deleted]]

Transcription Normalization DiffBind Transcription Normalization DiffBind • 1.6k views

ADD COMMENT • link 11.2 years ago Rory Stark ★ 5.2k