Chip-seq quality control
1
0
Entering edit mode
Lucia Peixoto ▴ 330
@lucia-peixoto-4203
Last seen 10.2 years ago
Hi, I am new to Chip-seq, my experiment's sequencing has finished, and the read alignment is currently running The experiment was done for histone acetylation, and I have two types of controls: input DNA and unmodified histone. I have two conditions and 6 biological replicates of each condition I wanted some advice on how to perform basic quality control on Chip- seq data using Bioconductor and also some ideas of which kinds of biases people usually observe and I should keep my eyes open for any advice will be greatly appreciated! thanks Lucia [[alternative HTML version deleted]]
Sequencing Sequencing • 1.6k views
ADD COMMENT
0
Entering edit mode
@ivan-gregoretti-3975
Last seen 10.2 years ago
Canada
Hello Lucia, A proper response to your post would take a lecture rather than an email. I can't do that but I can bullet the main points. I think that it will help you if you are indeed a newcomer to ChIP-seq. 1) Expect 10 million reads per sample for a genome the size of human. 2) Stick to SAM/BAM formats so that you can use well known, publicly available tools. Your best friend is called Picard. 3) Remove duplicates. Again, Picard is your best friend. 4) Create WIG files for all samples, treatments and controls so that you can display them simultaneously on any genome browser. 5) Find peaks with a well documented peak finder. 6) Compute enrichment for all treatments relative to their controls. So, points 4 and 6 are your quality controls at this stage. Once you know what a good immunoprecipitation looks like compared to a bad one, you can start diving into the details. You can invent your own quality indicators. For instance, I compute the proportion of tags inside the 1000 strongest peaks. I do that for BOTH treatment and controls. In my workflow, Bioconductor does not get involved until I reach point 6. Happy ChIPing. Ivan On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto <luciap at="" iscb.org=""> wrote: > Hi, > I am new to Chip-seq, my experiment's sequencing has finished, and the read > alignment is currently running > The experiment ?was done for histone acetylation, and I have two types of > controls: input DNA and unmodified histone. > I have two conditions and 6 biological replicates of each condition > I wanted some advice on how to perform basic quality control on Chip-seq > data using Bioconductor > and also some ideas of which kinds of biases people usually observe and I > should keep my eyes open for > any advice will be greatly appreciated! > thanks > > Lucia > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On 10/04/2011 07:33 AM, Ivan Gregoretti wrote: > Hello Lucia, > > A proper response to your post would take a lecture rather than an > email. I can't do that but I can bullet the main points. I think that > it will help you if you are indeed a newcomer to ChIP-seq. > > 1) Expect 10 million reads per sample for a genome the size of human. I'd run some basic QA on your lanes, via ShortRead::qa on the fastq files (or bam if fastq are not available); use FastqSampler if memory is tight (but in general if memory is tight the solution will be to find a larger computer). See http://bioconductor.org/help/workflows/high-throughput-sequencing/ for qa and perhaps other operations common to RNAseq / ChIPseq work flows > > 2) Stick to SAM/BAM formats so that you can use well known, publicly > available tools. Your best friend is called Picard. People can and do use R / Bioconductor for Picard-like tasks. > 3) Remove duplicates. Again, Picard is your best friend. > 4) Create WIG files for all samples, treatments and controls so that > you can display them simultaneously on any genome browser. here for interactive use I would rather use basic R plotting commands, avoiding the round-trip and allowing programmatic interaction. > 5) Find peaks with a well documented peak finder. probably a good suggestion for a one-off or common ChIP; the chipseq vignette http://bioconductor.org/packages/release/bioc/html/chipseq.html provides inspiration for more flexible analysis; packages under the ChIPseq biocViews term (Software --> AssayTechnologies -> HighThroughputSequencing->ChIPSeq) might offer a solution tailored to your ChIP. > 6) Compute enrichment for all treatments relative to their controls. again the chipseq vignette is an alternative source. > > So, points 4 and 6 are your quality controls at this stage. Once you > know what a good immunoprecipitation looks like compared to a bad one, > you can start diving into the details. You can invent your own quality especially at getting a sense for good versus bad results the interactivity of R / Bioconductor seem essential. Martin > indicators. For instance, I compute the proportion of tags inside the > 1000 strongest peaks. I do that for BOTH treatment and controls. > > In my workflow, Bioconductor does not get involved until I reach point 6. > > Happy ChIPing. > > Ivan > > > > > > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at="" iscb.org=""> wrote: >> Hi, >> I am new to Chip-seq, my experiment's sequencing has finished, and the read >> alignment is currently running >> The experiment was done for histone acetylation, and I have two types of >> controls: input DNA and unmodified histone. >> I have two conditions and 6 biological replicates of each condition >> I wanted some advice on how to perform basic quality control on Chip-seq >> data using Bioconductor >> and also some ideas of which kinds of biases people usually observe and I >> should keep my eyes open for >> any advice will be greatly appreciated! >> thanks >> >> Lucia >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
Thanks very much for the suggestions I will likely have more questions as I start the analysis Lucia On Tue, Oct 4, 2011 at 2:57 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 10/04/2011 07:33 AM, Ivan Gregoretti wrote: > > Hello Lucia, > > > > A proper response to your post would take a lecture rather than an > > email. I can't do that but I can bullet the main points. I think that > > it will help you if you are indeed a newcomer to ChIP-seq. > > > > 1) Expect 10 million reads per sample for a genome the size of human. > > I'd run some basic QA on your lanes, via ShortRead::qa on the fastq files > (or bam if fastq are not available); use FastqSampler if memory is tight > (but in general if memory is tight the solution will be to find a larger > computer). > > See http://bioconductor.org/help/**workflows/high- throughput-**sequencing/<http: bioconductor.org="" help="" workflows="" high-="" throughput-sequencing=""/>for qa and perhaps other operations common to RNAseq / ChIPseq work flows > > > > > > 2) Stick to SAM/BAM formats so that you can use well known, publicly > > available tools. Your best friend is called Picard. > > People can and do use R / Bioconductor for Picard-like tasks. > > > > 3) Remove duplicates. Again, Picard is your best friend. > > > 4) Create WIG files for all samples, treatments and controls so that > > you can display them simultaneously on any genome browser. > > here for interactive use I would rather use basic R plotting commands, > avoiding the round-trip and allowing programmatic interaction. > > > > 5) Find peaks with a well documented peak finder. > > probably a good suggestion for a one-off or common ChIP; the chipseq > vignette > > http://bioconductor.org/**packages/release/bioc/html/**chipseq.html <http: bioconductor.org="" packages="" release="" bioc="" html="" chipseq.html=""> > > provides inspiration for more flexible analysis; packages under the ChIPseq > biocViews term (Software --> AssayTechnologies -> HighThroughputSequencing-> > **ChIPSeq) might offer a solution tailored to your ChIP. > > > > 6) Compute enrichment for all treatments relative to their controls. > > again the chipseq vignette is an alternative source. > > > > > > So, points 4 and 6 are your quality controls at this stage. Once you > > know what a good immunoprecipitation looks like compared to a bad one, > > you can start diving into the details. You can invent your own quality > > especially at getting a sense for good versus bad results the interactivity > of R / Bioconductor seem essential. > > Martin > > > > indicators. For instance, I compute the proportion of tags inside the > > 1000 strongest peaks. I do that for BOTH treatment and controls. > > > > In my workflow, Bioconductor does not get involved until I reach point 6. > > > > Happy ChIPing. > > > > Ivan > > > > > > > > > > > > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap@iscb.org> wrote: > >> Hi, > >> I am new to Chip-seq, my experiment's sequencing has finished, and the > read > >> alignment is currently running > >> The experiment was done for histone acetylation, and I have two types > of > >> controls: input DNA and unmodified histone. > >> I have two conditions and 6 biological replicates of each condition > >> I wanted some advice on how to perform basic quality control on Chip-seq > >> data using Bioconductor > >> and also some ideas of which kinds of biases people usually observe and > I > >> should keep my eyes open for > >> any advice will be greatly appreciated! > >> thanks > >> > >> Lucia > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________**_________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> > >> Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > >> > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> > > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6