DESeq diagnostics

0

Entering edit mode

Michal Lulu ▴ 100

@michal-lulu-5533

Last seen 10.6 years ago

Hi, I'll try to ask once again :) My experiment is RNAseq of cells depletions from RNA decay factors, so one should expect to see more upregulations. The libraries are ribodepleted, paired end and stranded. After mapping with tophat I use HTSeq/DESeq combo to discover DE genes (among tophat genes.gtf, rRNA not included) I have problem with MA plots which are skewed (example attached), there is a clear slope suggesting that more upregulation of genes of lower expression. What should think about this? Diagnostic scatter plots of log ratio also look weird (second one is match MA plot); PCA is ok, heat maps too. I also tried to compare DESeq normalization with normalization to spike-ins present in the libraries, but the size factors assigned by DESeq seems much more accurate; although it's unclear why ? https://www.dropbox.com/s/0oykefjy1fvtq1i/1.pdf (MA plot) https://www.dropbox.com/s/r95oydftyeoz9mu/scatter.pdf (scatter plots, second it for the MA plot) Cheers, Michael [[alternative HTML version deleted]]

RNASeq Normalization DESeq RNASeq Normalization DESeq • 1.6k views

ADD COMMENT • link updated 12.5 years ago by Wolfgang Huber ★ 13k • written 12.5 years ago by Michal Lulu ▴ 100

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

Dear Michael the plots look fine, as far as it is possible to tell. What exactly are you worried about? Normalising to the majority, as DESeq does, works well for experiments where the expression changes are sparse, or symmetric. If these conditions are not met, spike-ins offer an alternative, but they can be fiddly experimentally, and if there are only few, accuracy can be problematic. These problems are well known, there is no simple cure, you just have to deal with them. Best wishes Wolfgang Il giorno Oct 7, 2012, alle ore 1:28 AM, Michael <mllmmllmmllmm at="" gmail.com=""> ha scritto: > Hi, > > > I'll try to ask once again :) > > > My experiment is RNAseq of cells depletions from RNA decay factors, so one > should expect to see more upregulations. The libraries are ribodepleted, > paired end and stranded. After mapping with tophat I use HTSeq/DESeq combo > to discover DE genes (among tophat genes.gtf, rRNA not included) > > > I have problem with MA plots which are skewed (example attached), there is > a clear slope suggesting that more upregulation of genes of lower > expression. What should think about this? > > Diagnostic scatter plots of log ratio also look weird (second one is match > MA plot); PCA is ok, heat maps too. > > I also tried to compare DESeq normalization with normalization to spike-ins > present in the libraries, but the size factors assigned by DESeq seems > much more accurate; although it's unclear why ? > > > https://www.dropbox.com/s/0oykefjy1fvtq1i/1.pdf (MA plot) > > https://www.dropbox.com/s/r95oydftyeoz9mu/scatter.pdf (scatter plots, > second it for the MA plot) > > > > Cheers, > Michael > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, Thanks for positive reply. I am sceptic about the trend of MA plot, namely a slope to the left. Normalization to many spike-ins is still tricky, and from my experience it was really hard to simply get one size factor. BTW, is it better to estimate dispersion in paires (treatment1 / control; treatment2 / control, ...) or rather for a whole dataset in case of 1 control / treatment 1-6 (each sample in 2 biol rep). Best, Micheal On Sun, Oct 7, 2012 at 3:31 PM, Wolfgang Huber <whuber@embl.de> wrote: > Dear Michael > > the plots look fine, as far as it is possible to tell. What exactly are > you worried about? > > Normalising to the majority, as DESeq does, works well for experiments > where the expression changes are sparse, or symmetric. If these conditions > are not met, spike-ins offer an alternative, but they can be fiddly > experimentally, and if there are only few, accuracy can be problematic. > These problems are well known, there is no simple cure, you just have to > deal with them. > > Best wishes > Wolfgang > > > Il giorno Oct 7, 2012, alle ore 1:28 AM, Michael <mllmmllmmllmm@gmail.com> > ha scritto: > > > Hi, > > > > > > I'll try to ask once again :) > > > > > > My experiment is RNAseq of cells depletions from RNA decay factors, so > one > > should expect to see more upregulations. The libraries are ribodepleted, > > paired end and stranded. After mapping with tophat I use HTSeq/DESeq > combo > > to discover DE genes (among tophat genes.gtf, rRNA not included) > > > > > > I have problem with MA plots which are skewed (example attached), there > is > > a clear slope suggesting that more upregulation of genes of lower > > expression. What should think about this? > > > > Diagnostic scatter plots of log ratio also look weird (second one is > match > > MA plot); PCA is ok, heat maps too. > > > > I also tried to compare DESeq normalization with normalization to > spike-ins > > present in the libraries, but the size factors assigned by DESeq seems > > much more accurate; although it's unclear why ? > > > > > > https://www.dropbox.com/s/0oykefjy1fvtq1i/1.pdf (MA plot) > > > > https://www.dropbox.com/s/r95oydftyeoz9mu/scatter.pdf (scatter plots, > > second it for the MA plot) > > > > > > > > Cheers, > > Michael > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Michal Lulu ▴ 100

0

Entering edit mode

Hi Michael On 07/10/12 18:17, Michael wrote: > BTW, is it better to estimate dispersion in paires (treatment1 / control; > treatment2 / control, ...) or rather for a whole dataset in case of 1 > control / treatment 1-6 (each sample in 2 biol rep). As you have only two replicates per treatment, you are most likely better of if you use your whole data set to get reasonably precise dispersion estimates. However, if there is one treatment where the replicates show stronger differences than in the othe treatments, this will increase dispersion and hence reduce power for all comparisons. So, better double-check that this is not the case, e.g. with the plotPCA function. Simon

ADD REPLY • link 12.5 years ago Simon Anders ★ 3.8k

0

Entering edit mode

> > > BTW, is it better to estimate dispersion in paires (treatment1 / control; >> treatment2 / control, ...) or rather for a whole dataset in case of 1 >> control / treatment 1-6 (each sample in 2 biol rep). >> > > As you have only two replicates per treatment, you are most likely better > of if you use your whole data set to get reasonably precise dispersion > estimates. > > However, if there is one treatment where the replicates show stronger > differences than in the othe treatments, this will increase dispersion and > hence reduce power for all comparisons. So, better double-check that this > is not the case, e.g. with the plotPCA function. > > Hi Simon, Thanks for reply, I have already double-checked and pooling 16 samples (2 rep X 8 cond) increases dispersion, looking in the PCA plot. I decide to divide the set in two: 3 conds + control and other 5 + control (more distal). When pooling all, first 3 three are indistinguishable for the control in the PCA and number of DE genes increases for all the samples, especially the 5 distal, for which I got many infinite fold changes... Best, Michael > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Michal Lulu ▴ 100

Login before adding your answer.