Group comparison with one sample
1
0
Entering edit mode
kay2023 • 0
@49ef7f4f
Last seen 10 months ago
United States

I have to analyze an RNA-seq dataset. Goal is to compare two groups - say case and control. The issue is that there is only one sample per group. In a normal situation , I would not proceed with the analysis as n=1 is not really a "group", its not statistically justifiable, results cannot be generalized.

But this data is on cell line from a real patient with a disease. I will circle back with the investigator to see if its possible to generate more data. But in the event that its not possible to get more data, would voom-limma be a good tool to try (with all the caveats mentioned above). Thanks.

rnaseqcomp • 670 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

limma-voom cannot analyse data without replicates. The only option is to use edgeR with a preset dispersion parameter. See "What to do if you have no replicates" in the edgeR User's Guide:

It is very easy. If counts is your matrix of read counts with two columns corresponding to case and control, then you can do a DE analysis by:

library(edgeR)
y <- DGEList(counts, group=c("control","case"))
y <- normLibSizes(y)
et <- exactTest(y, dispersion=0.2)
topTags(et)

Of course, the dispersion setting here is arbitrary and having replicates would be infinitely better. Nevertheless, the above analysis with any positive value for the dispersion is vastly better than assuming Poisson variation, as very many papers in the literature have done in similar situations.

The value of 0.2 that I have chosen is fairly conservative. Good quality RNA-seq data on a cell line should be less variable than that.

ADD COMMENT
0
Entering edit mode

Thank you so much for the feedback Gordon Smyth ! Since the results from one replicate may not be generalizable, I decided to calculate the ratio of case/control to generate a ratio equivalent of fold change, which I then used as ranking criteria for a GSEA analysis.
I first pre-filtered the RNA-seq to remove the lowly expressed genes, and for the remaining genes, input that along with the ratio into GSEA.

ADD REPLY
0
Entering edit mode

In my opinion, the edgeR code above will give a better ranking of genes in terms of likely biological significance than simply ranking by fold-change, depending on how you compute the fold-changes.

Also beware that pre-ranked GSEA is gives highly inflated signifance because it doesn't take into account inter-gene correlations.

ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6