Entering edit mode
Hi Ian and everybody,
The problem of PCR duplicates cannot be elegantly solved from the
output of the sequencer. It has to be solved at the bench. That is why
I believe that, as mentioned by Rory, highly redundant ChIPs must be
done again from scratch.
The undesired collateral of removing veracious duplicates does indeed
affect binding intensity and therefore the difference, specially in
strong signals.
Removing or not removing, your are going to err, it is up to you to
decide in which direction.
Do you want a solution? Do paired end, 25 cycles on each side. Problem
solved.
I hope this helps.
Ivan
On Thu, Oct 20, 2011 at 11:26 AM, Ian Donaldson
<ian.donaldson at="" manchester.ac.uk=""> wrote:
> Thank you for your response Ivan!
>
> I completely agree that removing duplicates is a necessary step for
peak calling.
>
> I seems as though keeping duplicates is a "double edged sword" where
it is not easy to separate PCR artifactual reads from real ones. ?I
think what, to me, makes using all the reads seem appealing/necessary
is that in differential binding you want to see the actual differences
in binding intensity (reads). ?If non-redundant reads are used then
isn't the difference in binding intensity lost or apparently reduced
(maybe this does not matter in the DESeq analysis?
>
> Thanks again!
>
> Ian
> ________________________________________
> From: Ivan Gregoretti [ivangreg at gmail.com]
> Sent: 20 October 2011 14:35
> To: Ian Donaldson
> Cc: Simon Anders; bioconductor at r-project.org
> Subject: Re: [BioC] Using DESeq with ChIP-seq data - all or non-
redundant reads?
>
> Hello Ian,
>
> I think that, in general, removing duplicates is good praxis in
ChIP-seq.
>
> Of course, when you have very high coverage, veracious but
identically
> positioned tags will be mistaken as PCR duplicated.
>
> How is that affecting you?
> You run the risk of underestimating the signal strength of stronger
> peaks rather than weak ones.
>
> Removal of duplicates affects more stronger peaks.The weaker the
peak,
> the less likely it is to be marked by veracious duplicates. So,
> removing duplicates, even veracious ones, will not make your weakest
> signals disappear, which is critical.
>
> If instead of peak intensity you only care about peak location,
then,
> duplicate removal should be used without reserve.
>
> As always, opinions that disagree are welcome.
>
> Ivan
>
> Ivan Gregoretti, PhD
> National Institute of Diabetes and Digestive and Kidney Diseases
> National Institutes of Health
> 5 Memorial Dr, Building 5, Room 205.
> Bethesda, MD 20892. USA.
> Phone: 1-301-496-1016 and 1-301-496-1592
> Fax: 1-301-496-9878
>
>
>
> On Tue, Oct 18, 2011 at 10:09 AM, Ian Donaldson
> <ian.donaldson at="" manchester.ac.uk=""> wrote:
>> I have been using DESeq to look at differential binding in ChIP-seq
for a while now. ?But recently we have been discussing locally whether
the ChIP-seq reads used in DESeq should be the full or non-redundant
set? ?There is a worry that the full set of reads may contain
spuriously amplified reads, but then using a non-redundant set remove
information, i.e. particularly enriched binding regions.
>>
>> I would be very interested to get your views on this.
>>
>> Thanks!
>>
>> Ian
>> ________________________________________
>> From: bioconductor-bounces at r-project.org [bioconductor-bounces
at r-project.org] on behalf of Simon Anders [anders at embl.de]
>> Sent: 20 July 2011 14:20
>> To: bioconductor at r-project.org
>> Subject: Re: [BioC] Using DESeq with ChIP-seq data
>>
>> Hi Ian
>>
>> On 07/20/2011 02:18 PM, Simon Anders wrote:
>>> What I meant is: Pool all four samples, give them to the peak
finder in
>>> one big chunk and so get a list of binding regions. Then, count
for each
>>> sample how many reads fall into each of the binding regions,
obtaining a
>>> table with four columns for your four samples and one row for each
>>> binding region found in the pool. Give this table to DESeq. We've
tried
>>> this approach once with some Pol-II ChIP-Seq data and it worked
quite well.
>>
>> Forgot to mention: When we did this, we counted the reads from the
>> ChIPed sample. We used the input control samples only for the peak
>> finding, not in the counting. IIRC, we only had one common control
lane
>> for both conditions, so that it would cancel out when comparing the
>> conditions.
>>
>> If you have separate controls, you may want to count for them as
well
>> and use DESeq's GLM function to test for an interaction contrast.
>>
>> ? S
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>