Meta-analysis on RNA-seq data sets
1
0
Entering edit mode
Travis • 0
@154e642d
Last seen 6 weeks ago
United States

I am wanting to perform a meta-analysis using 8 publicly available RNAseq datasets which each have disease and control samples (~330 samples in total). A few of these datasets have samples from different histological locations (e.g. low fibrosis vs high fibrosis). These 8 datasets contain two different sequencing platforms (6 datasets on Illumina and 2 datasets on Ion Torrent). What is the best way to remove the batch effects (datasets and platforms) so that I can perform a differential expression analysis?

I have attempted using Combat_seq() to remove the batch effects, but it only allows me to run one batch effect at a time, so I'm not sure if that is a viable approach.

Would it be better to try to add the batch effects to the model in DESeq2?

Alternatively, for this type of meta-analysis, would it be better to run each dataset individually and then perform the meta-analysis using p-value combination or other methods?

I would appreciate any suggestions.

meta-analysis sva Combat_Seq RNAseq • 368 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

This isn't really the place for general analysis questions (a better choice being biostars.org). That said, I have never personally found combining disparate datasets into one analysis to be particularly useful, but ymmv. I prefer a meta-analysis approach. Depending on the studies, you could use GeneMeta to make comparisons using effect sizes (in which case you are likely better off using limma-voom rather than DESeq2, because GeneMeta expects you to have t-statistics), or you could use metapod to combine using the p-values.

0
Entering edit mode

I am myself doing a meta-analysis using metapod currently, in particular using parallelStouffer, and I should point out that Stouffer's method is based on the idea that p-values map one-to-one onto the normal distribution, so you can simply map p-values to z-scores, compute a weighted z-score for each gene, convert back to p-values and voila!

However! Stouffer's method is based on one-tailed p-values, which you won't get from any software. You need one-tailed p-values if you care to incorporate the sign of the statistic into your meta analysis. In other words, consider a situation where gene X has a logFC of -1.2 in study A and a logFC of 1.2 in study B, and p-value of 0.001 in both. If you naively convert the p-values to z-scores, you get -3.1 for both, and the weighed mean of that z-score will be something similar to -3.1, which then converts back to ~0.001 for the p-value.

I would argue that the p-value should be close to [edit] 1 (not zero, lol) though, because the gene is strongly up-regulated in one study and strongly down-regulated in the other. In which case you should have converted the p-values to one-tailed, use parallelStouffer, and then convert back to two-tailed p-values.

ADD REPLY
0
Entering edit mode

Thank you for your suggestions and explanation of your approach!

ADD REPLY

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6