RUVseq using RUVg most non-differential expressed genes
2
0
Entering edit mode
tonja.r ▴ 80
@tonjar-7565
Last seen 8.2 years ago
United Kingdom

I was following a protocol of RUVseq for a method RUVg. After performing a first pass of edger differential analysis to identify the most non-differential expressed genes I took a look on my table top and found out that I had only 7 genes with FDR < 0.9 and all others genes have an FDR of >0.999. The concept of RUVg is to take the most undifferentially expressed genes to find the factors of unwanted variants but if I have only 7 genes with <0.9, doesn't it mean already that RUVg will not help me to account for the batch effect?

First pass of edgeR:

design <- model.matrix( ̃x, data=pData(set))
y <- DGEList(counts=counts(set), group=x)
y <- calcNormFactors(y, method="upperquartile") y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design) lrt <- glmLRT(fit, coef=2)
top <- topTags(lrt, n=nrow(set))$table

 

ruvseq • 2.6k views
ADD COMMENT
1
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 10 months ago
University of Padova

An alternative strategy would be to use a general list of housekeeping genes, like the one that you can find here: http://www.stat.berkeley.edu/~johann/ruv/resources/hk.txt (for human, it should work fine for mouse, too, but may not for other organisms).

If you have replicate samples, you can consider using RUVs. We find that it is usually quite robust to the set of negative controls, so it should not be a problem even if your set of genes is not strictly a set of negative controls.

 

ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

Well, it's hard to say. The lack of significant genes may be due to a batch effect between your replicates, which is inflating your dispersion estimates and reducing detection power for DE. If this is the case, then RUVg might be able to help by removing that batch effect. But, you won't know until you try.

Of course, if this were hypothetically true, then you wouldn't be able to define non-DE genes as those with large adjusted p-values. Even moderately DE genes would have large p-values due to the lack of power from inflated variability. Including DE genes in the control set would probably cause RUVg to remove genuine DE between the conditions of interest, which is not ideal. You could probably get around this by using RUVr instead.

ADD COMMENT

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6