Question

assess quality of data

0

Entering edit mode

bilcodygm • 0

@bilcodygm-14802

Last seen 7.1 years ago

Dear all,

Please help me. I have RNAseq data, which I normalized with TMM and then applied the likelihood ratio test (edgeR).

when I look at the BCV plot I see this:

The red dots, are the significant genes in the dataset.

Is this OK? Are all the data points not too wide spread?

Thank you for your advice.

rnaseq edger limma-voom • 1.5k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 43k • written 7.2 years ago by bilcodygm • 0

0

Entering edit mode

Can you clarify what it is that is concerning you? What do you mean by "wide spread"? What did you expect the plot to look like?

Obviously the dispersion of your data is large, with the average BCV at nearly 100%. However you still have DE genes, so the separation between your groups must also be large.

We don't know anything else about your data. What do you want us to comment on?

How does this question relate to DESeq2 or limma (neither of which you have used)?

ADD REPLY • link 7.2 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you for your reply!

I was a bit quick in adding limma and DESeq2 maybe. What I have done is a comparison of several normalization methods (TMM, upperquartile, quantile and DESeq2's default method) together with the tests (RLT, QLF in edgeR and voom-eBayes in limma) they all show this large variance/dispersion range and high BCV. so, I was just wondering how to look at this, simply as it is: highly variable data, probably due to the fact that it is human tissue? I still have DE genes indeed. The adjuste p-values vs the non-adjusted also do not indicate strange behaviour, except for the quantile-voom-eBayes method. there it seems, that the significant p-values cave in more drastically than with all the other methods, but since all the other methods show significant p + a plateau in non-significant p-values, I argue, that this is not so relevant. Further to this, the quantile-voom-eBayes method was also by far the most conservative method. While the other methods yielded 200-388 significant genes, this method only returned 50. Is this a known aspect of this method, I wondered?

On the whole I have a feelng that the dataset is simply highly variable, high dispersion. The differentially expressed genes will have to be validated somehow now.

ADD REPLY • link 7.2 years ago bilcodygm • 0