Question

Question on how to proceed and analyze 9 microarray datasets of two different platforms to perform meta-analysis

0

Entering edit mode

Konstantinos Yeles ▴ 80

@konstantinos-yeles-8961

Last seen 16 months ago

Italy

Dear community,

As I’m currently studying the biological phenomenon of irradiated cells, I have downloaded from GEO repository 9 human microarray datasets, in order to perform “some kind” of meta-analysis. In detail, I would like to identify genuine differentially expressed genes, and subsequently to conduct functional enrichment analysis. My main problem-issue, is because I’m a newbie in R and statistical analysis, i wonder how I should proceed with the analysis of my datasets. Specifically, 7 of the 9 datasets are Agilent (6 are of the same platform- Agilent-014850 Whole Human Genome Microarray 4x44K G4112F, and one Agilent- Agilent-026652 Whole Human Genome Microarray 4x44K v2), whereas both of the 2 Illumina datasets, comprise of the platform llumina HumanWG-6 v3.0 expression beadchip.

Thus, one first naive thought was to perform some kind of cross-normalization between the datasets of each platform, and to perform two separate analysis and then compare my results (for instance in the final DE lists). However, except from the obvious problem that could arise from the specific effects of each data-set, also the experiment design increases more the complexity: in other words, although there actually 3 conditions in each dataset(control, bystander & irradiated cells), some time points are different or don’t exist in some datasets. So, how could I proceed in a “safe way” with my actual analysis? I should analyze separately each dataset, export my gene lists with my differentially expressed genes(i.e. gene symbols) and then somehow identify common genes between common comparisons? Moreover, is there a package that after exporting from each dataset the statistics to perform meta-analysis?

Any suggestion or feedback on this matter would be very helpful !!!

Best,

Konstantinos

microarray meta-analysis cross-platform normalization statistical inference • 3.0k views

ADD COMMENT • link updated 8.8 years ago by alexvpickering ▴ 110 • written 9.2 years ago by Konstantinos Yeles ▴ 80

score 1 · Answer 1 · 2016-01-22

I usually use the GeneMeta package for this sort of thing. The basic idea is to process all the different platforms separately, then subset to consistent reporter molecules, and then make comparisons. How you make the sample types consistent between experiments is up to you, and what assumptions you are willing to make. The safest thing to do is to not make any assumptions, and if you have an experiment with different time points (or none at all), then you might not be able to use it.

score 1 · Answer 2 · 2016-07-01

I recently released the Bioconductor package crossmeta which extends the effect size meta-analysis method in GeneMeta to allow for genes that were not measured in all studies. crossmeta also automates downloading, annotating, and batch effect correction of raw data. All you need is a list of GSEs that you want to perform a meta-analysis of. You can check out the vignette for usage details and my blog for some performance analysis and usage examples of crossmeta.

Good luck!

score 0 · Answer 3 · 2016-01-22

0

Entering edit mode

Konstantinos Yeles ▴ 80

@konstantinos-yeles-8961

Last seen 16 months ago

Italy

Dear James,

thank you for your detailed answer !! In order to be certain that I understood your basic idea: so you meant that I should preprocess individually and perform statistical inference, in order to acquire DE genes for each dataset for needed comparisons, then merge common probesets at the final annotation level(i.e. gene symbol) and finally use the resulted statistics for these common genes in the GeneMeta R package ? Moreover, regarding the important issue of time points: essentially, 5 of 7 agilent datasets have the common timepoint of 4h, while 2 agilent and 2 illumina have 2h in common & 3 other datasets have in common the 6h timepoint, etc.

Thus, in order to focus on the common timepoints, for instance regarding the timepoint of 4h, I could perform the above implementation on these 5 datasets and ignore the other for this time of comparison ?

Best,

Konstantinos

ADD COMMENT • link 9.2 years ago Konstantinos Yeles ▴ 80

0

Entering edit mode

No, that isn't what I meant at all, which is why I gave you a link to the GeneMeta vignette; so you could read about the package yourself, including how one would use the package to do the analysis.

ADD REPLY • link 9.2 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James,

please excuse me for returning again with more questions, but I had a first look on the vignette, and I would like to ask you some further explanations to be certain of some appropriate implementations. In detail, you have mentioned above that “The basic idea is to process all the different platforms separately, then subset to consistent reporter molecules”—thus, according to my specific experimental design, as the 4h comparison is common in 5 agilent datasets, I could normalize each dataset separately, and then use geneMeta with the common probesets ? Like the first example in the vignette, in which one dataset is splited into 2 subsequent datasets? My one major concern is that from these 5 agilent, only 4 are of the same platform, which hampers a bit the situation, thus:

Should finally consider only the 4 of these with both the common platform-Agilent and also common chip for use with the GeneMeta ? In order to avoid other problems of keep possibly only common probesets if I consider both 5? Moreover, GeneMeta can handle more than two datasets, correct?

2. Alternatively, could I perform statistical inference for each of the 5 Agilent datasets separately—for the 4h--, extract the DE lists at the final annotation level(gene symbol), and see if I can find any common differentially expressed genes ??

Best,

Konstantinos

ADD REPLY • link 9.2 years ago Konstantinos Yeles ▴ 80