Question

Meta-analysis of microarrays (and RNA-seq) data

1

Entering edit mode

giroudpaul ▴ 40

@giroudpaul-10031

Last seen 5.3 years ago

France

Dear Bioconductor Scientists,

I am rather new to NGS data analysis, I learned what I know all by myself as there is no bioinformaticians where I work.

Nevertheless, I already analyzed my own affymetrix HTA 2.0 data, as well as a couple public data on affymetrix HGU133 and Illumina HumanHT12 V3/4 microarrays (and should analyze some Agilent 4x44K soon). In the past I also got my hand on Chip-seq/RNA-seq data, and learned the basics (but it's been some time, I hope it's just like bike, you never really forget it ;) ).

I am working on an immune cell subtype, which can be schematized this way :

Primary cells are extracted from blood, let's call them the O cells. They can be differentiated into A, B and C subtypes (and more).

The trouble with public data is the inconsistency in the differentiation methods, the poor number of replicates, the fact that controls are not the same (sometimes fresh O cells, sometimes O cells cultivated with only medium for the same time, sometimes no controls...). So I figured out that it would make sense to combine all these data in a meta analysis to gain power. And maybe also add RNAseq results from similar experiments for an increased precision.

The results I would like to obtain are :

Identify genes differentially expressed in specific conditions (A, B or C against O, B/C against A), either in all studies or in a majority of studies.
Identify gene expression profiles of A, B and C in order to find potentially similar cells enrichment in cancer tissues data (microarray/RNA-seq)

For now, I read some literature (10.1371/journal.pmed.0050184;10.1186/1471-2105-14-368), found some packages (crossmeta, GeneMeta, metaArray, MetaOmics), but as I have limited statistical knowledge, the explications are somewhat obscure to me here, as to what method (p-value vs effect size vs rank ?) is best suitable for my purpose.

I guess my questions for you dear members are :

Is this kind of analysis (as I explain it) possible ? Even for neophyte ?
What are your advices on how to perform this ? Which packages do you recommend ? Could you share some experience on similar subjects ?

Thank you for your time,

Paul

meta-analysis • 3.6k views

ADD COMMENT • link updated 8.0 years ago by alexvpickering ▴ 110 • written 8.0 years ago by giroudpaul ▴ 40

score 5 · Accepted Answer · 2017-04-30

Hi Paul,

I am the author of crossmeta, which uses the same effect-size methods as GeneMeta. The methods were modified so that genes that are only measured in a subset of studies can still be included in the meta-analysis. I chose to use an effect-size (as opposed to p-value or rank) meta-analysis method largely because crossmeta was designed to produce a signature that can be used by ccmap to find drug candidates to either reverse or mimic a gene expression signature. From my current understanding, effect-size meta-analyses are generally preferable to p-value combination methods (e.g. see metap vignette). Rank-combination methods are even less preferable and would be chosen if all you have is ordered lists of genes.

Is this kind of analysis (as I explain it) possible ? Even for neophyte ?

This is a big part of what I hope crossmeta accomplishes. All you need is a list of microarray GSEs (crossmeta does not currently support RNAseq data) from GEO that you would like to include in your meta-analysis. After that, the basic workflow is:

# studies from GEO
gse_names  <- c("GSE9601", "GSE15069")

# get raw data for specified studies
get_raw(gse_names)

# load and annotate raw data
esets <- load_raw(gse_names)

# perform differential expression analysis
anals <- diff_expr(esets)

# add sample sources (if you want to perform separate meta-analyses for different tissue sources)
anals <- add_sources(anals)

# perform effect-size meta-analysis
es_res <- es_meta(anals, by_source = TRUE)

crossmeta also does pathway meta-analyses using PADOG, which outperforms other methods at prioritizing expected pathways (ref1, ref2). To do so:

# pathway analysis for each contrast
path_anals <- diff_path(esets, anals)

# pathway meta analysis by tissue source
path_res <- path_meta(path_anals, by_source = TRUE)

Other than a list of GSEs that you want to include, all that you have to do is select control and test samples (when running diff_expr) and specify tissue sources (when running add_sources). Both of these functions use a GUI for user input.

UPDATE:

For the true neophyte, I just released a web-app adaptation of crossmeta at www.rnama.com. Unlike crossmeta, RNA Meta Analysis let's you search for similar contrasts in 26,000+ studies and includes support for both microarray and RNA-Seq data.