MEDIPS

0

Entering edit mode

Paolo Kunderfranco ▴ 350

@paolo-kunderfranco-5158

Last seen 7.4 years ago

Dear Lucas Chavez, Repeat masking of the reference genome assembly should be considered before aligning my samples? Or I may loose information? Thanks, Paolo 2013/2/19 Lukas Chavez <lukas.chavez.mailings@googlemail.com> > > Dear Paolo, > > > why in the example refered in the manual there is only one INPUT.SET for > two conditions? > > Currently, MEDIPS allows for only one control, one treatment, and one > combined Input data set. However, there is obviously a desperate need for > considering replicates per group as well as individual Input data sets. > Therefore (and because of many other issues), I have extensively revised > the MEDIPS package which will allow for processing replicates per condition > as well as two groups of Input data. I intend to update the MEDIPS package > as soon as possible, especially in advance of the next Bioconductor > release. Nevertheless, it is not clear how you designed your experiments > and your analysis strategy? The MEDIPS update will be helpful, e.g. in case > you are comparing two groups of IP-seq samples and you want to consider two > according groups of Input samples in order to identify genomic variants > that influence the IP enrichments. > > >I followed MEDUSA protocol MEDUSA protocol (...) > > I greatly appreciate that MEDIPS has been incorporated in other analysis > pipelines. However, please excuse that I can only comment on issues and > functionalities of the MEDIPS package. > > > (...) when I filter out for non-unique reads. Roughly 90 % are discarded > (...) > > This issue may refer to amplification and oversequencing problems and > there are different opinions about unique reads. However, the fraction of > non-unique reads in you sequencing data is an issue that goes beyond what I > can discuss here. Currently, MEDIPS allows for considering all reads or for > replacing all unique reads (or maybe better: reads that map to the same > genomic position) by one representative. However, you can pre-filter your > input files by any estimate of global or local thresholds for non- unique > reads and continue using MEDIPS by considering all given mapping results. > > > >Is it possible that such a low number of reads is sufficient to generate a > saturated and reproducible methylation profile? > > This depends on the methylaion status of your reference genome. In case > you are studying the methylation status of a small and only barely > methylated genome, your results might be reasonable. > > All the best, > Lukas > > > > Dear All, > > I will now start and anlyze some MeDIP seq data with MEDIPS Bioconductor > Package > > I went through reading all the MEDIPS manual, > > I have to compare methylation profile of two cell lines, I have the Input > of both of them > , > why in the example refered in the manual there is only one INPUT.SET for > two conditions? > > CONTROL.SET, TREAT.SET, and INPUT.SET > > > Any suggestions? > > Thanks, > Paolo > > > On Fri, Feb 15, 2013 at 5:33 AM, Paolo Kunderfranco < > paolo.kunderfranco@gmail.com> wrote: > >> Dear Lucas Chavez >> >> I followed MEDUSA protocol to filter out both not properly paired, low >> quality mapping and non-unique sequences from my alignment files to use >> MEDIPS fur further analysis of DMR. >> >> For example one mC sample started with 100 milions reads. 80 % mapped, 70 >> % >> of them properly mapped with high quility (mapQ>40). >> The problem arises when I filter out for non-unique reads. Roughly 90 % >> are >> discarded leading to a final number of 2-4 milions of reads. >> All my mC samples behave in the same way. >> >> Maybe the DNA starting material was not properly quantified (2-3 ng >> instead >> of 5 ng were used for the generation of the libraries). >> We didn't observe the same problem for the Input DNA ( correctly >> quantified) and for 2 samples out of 4 for 5-hydroxy-mC. >> >> The high number of non-unique reads could be due to a technical problem or >> a biological problem? Have you ever experienced a similar problem? >> How do you think I should proceed with the analysis? Is it absolutely >> necessary to remove non-unique reads for MEDIPS analysis? >> >> Is the first time I deal with this kind of analysis I would like to >> undestand which is the best approach to follow. >> >> I tried to run MEDIPS.saturationAnalysis with the following samples and >> the >> correalation looks fine: >> >> $numberReads >> [1] 1890528 >> >> $maxEstCor >> [1] 1.890528e+06 9.997250e-01 >> >> $maxTruCor >> [1] 9.452640e+05 9.994605e-01 >> >> Is it possible that such a low number of reads is sufficient to generate a >> saturated and reproducible methylation profile? >> >> Thank you very much for your time, >> Paolo >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

Sequencing Alignment MEDIPS Sequencing Alignment MEDIPS • 1.3k views

ADD COMMENT • link updated 11.7 years ago by Lukas Chavez ▴ 570 • written 11.7 years ago by Paolo Kunderfranco ▴ 350

0

Entering edit mode

Lukas Chavez ▴ 570

@lukas-chavez-5781

Last seen 6.8 years ago

USA/La Jolla/UCSD

Dear Paolo, although it is a very general question not really specific for my MEDIPS package, here is my comment: by using the non-masked reference genome you might be able to cover fractions of repetitive DNA depending on your read length and paired/ single end sequencing. In my opinion, there are no major advantages for using the masked reference genome as reference for mapping. Lukas On Wed, Apr 3, 2013 at 1:15 AM, Paolo Kunderfranco < paolo.kunderfranco@gmail.com> wrote: > Dear Lucas Chavez, > > Repeat masking of the reference genome assembly should be considered > before aligning my samples? Or I may loose information? > > Thanks, > > Paolo > > > > 2013/2/19 Lukas Chavez <lukas.chavez.mailings@googlemail.com> > >> >> Dear Paolo, >> >> > why in the example refered in the manual there is only one INPUT.SET for >> two conditions? >> >> Currently, MEDIPS allows for only one control, one treatment, and one >> combined Input data set. However, there is obviously a desperate need for >> considering replicates per group as well as individual Input data sets. >> Therefore (and because of many other issues), I have extensively revised >> the MEDIPS package which will allow for processing replicates per condition >> as well as two groups of Input data. I intend to update the MEDIPS package >> as soon as possible, especially in advance of the next Bioconductor >> release. Nevertheless, it is not clear how you designed your experiments >> and your analysis strategy? The MEDIPS update will be helpful, e.g. in case >> you are comparing two groups of IP-seq samples and you want to consider two >> according groups of Input samples in order to identify genomic variants >> that influence the IP enrichments. >> >> >I followed MEDUSA protocol MEDUSA protocol (...) >> >> I greatly appreciate that MEDIPS has been incorporated in other analysis >> pipelines. However, please excuse that I can only comment on issues and >> functionalities of the MEDIPS package. >> >> > (...) when I filter out for non-unique reads. Roughly 90 % are >> discarded (...) >> >> This issue may refer to amplification and oversequencing problems and >> there are different opinions about unique reads. However, the fraction of >> non-unique reads in you sequencing data is an issue that goes beyond what I >> can discuss here. Currently, MEDIPS allows for considering all reads or for >> replacing all unique reads (or maybe better: reads that map to the same >> genomic position) by one representative. However, you can pre- filter your >> input files by any estimate of global or local thresholds for non- unique >> reads and continue using MEDIPS by considering all given mapping results. >> >> >> >Is it possible that such a low number of reads is sufficient to generate >> a >> saturated and reproducible methylation profile? >> >> This depends on the methylaion status of your reference genome. In case >> you are studying the methylation status of a small and only barely >> methylated genome, your results might be reasonable. >> >> All the best, >> Lukas >> >> >> >> Dear All, >> >> I will now start and anlyze some MeDIP seq data with MEDIPS Bioconductor >> Package >> >> I went through reading all the MEDIPS manual, >> >> I have to compare methylation profile of two cell lines, I have the Input >> of both of them >> , >> why in the example refered in the manual there is only one INPUT.SET for >> two conditions? >> >> CONTROL.SET, TREAT.SET, and INPUT.SET >> >> >> Any suggestions? >> >> Thanks, >> Paolo >> >> >> On Fri, Feb 15, 2013 at 5:33 AM, Paolo Kunderfranco < >> paolo.kunderfranco@gmail.com> wrote: >> >>> Dear Lucas Chavez >>> >>> I followed MEDUSA protocol to filter out both not properly paired, low >>> quality mapping and non-unique sequences from my alignment files to use >>> MEDIPS fur further analysis of DMR. >>> >>> For example one mC sample started with 100 milions reads. 80 % mapped, >>> 70 % >>> of them properly mapped with high quility (mapQ>40). >>> The problem arises when I filter out for non-unique reads. Roughly 90 % >>> are >>> discarded leading to a final number of 2-4 milions of reads. >>> All my mC samples behave in the same way. >>> >>> Maybe the DNA starting material was not properly quantified (2-3 ng >>> instead >>> of 5 ng were used for the generation of the libraries). >>> We didn't observe the same problem for the Input DNA ( correctly >>> quantified) and for 2 samples out of 4 for 5-hydroxy-mC. >>> >>> The high number of non-unique reads could be due to a technical problem >>> or >>> a biological problem? Have you ever experienced a similar problem? >>> How do you think I should proceed with the analysis? Is it absolutely >>> necessary to remove non-unique reads for MEDIPS analysis? >>> >>> Is the first time I deal with this kind of analysis I would like to >>> undestand which is the best approach to follow. >>> >>> I tried to run MEDIPS.saturationAnalysis with the following samples and >>> the >>> correalation looks fine: >>> >>> $numberReads >>> [1] 1890528 >>> >>> $maxEstCor >>> [1] 1.890528e+06 9.997250e-01 >>> >>> $maxTruCor >>> [1] 9.452640e+05 9.994605e-01 >>> >>> Is it possible that such a low number of reads is sufficient to generate >>> a >>> saturated and reproducible methylation profile? >>> >>> Thank you very much for your time, >>> Paolo >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]

ADD COMMENT • link 11.7 years ago Lukas Chavez ▴ 570

Login before adding your answer.