Question

EGSEA usage with microarray data (with vooma)?

0

Entering edit mode

Pekka Kohonen ▴ 190

@pekka-kohonen-5862

Last seen 7.2 years ago

Sweden

Hi,

It says in the paper of EGSEA that it only works with RNA-seq. But it uses EL-objects produced by voom as input, and these can now be produced from microarray data as well using vooma or other functions in limma. So does EGSEA also now work with microarray data? I will try this out, just wanted to put the question out there.

Best, Pekka

EGSEA gene set testing vooma limma voom gsea • 2.1k views

ADD COMMENT • link updated 11 months ago by Chris ▴ 20 • written 8.3 years ago by Pekka Kohonen ▴ 190

score 2 · Accepted Answer · 2017-01-03

2

Entering edit mode

Monther Alhamdoosh ▴ 40

@monther-alhamdoosh-10001

Last seen 5.8 years ago

Australia/Melbourne/CSL Limited

Hi Pekka,

Thanks for your questions. We do not mention that EGSEA works with microarray datasets since some of the base methods' parameters need to be tuned to suite microarray datasets and we have not tested it with this type of data. We will work on this soon. Let us know if things work well with the current release!

Cheers,

Monther

ADD COMMENT • link 8.3 years ago Monther Alhamdoosh ▴ 40

0

Entering edit mode

Hi Monther,

I have done some testing with EGSEA using microarray data (a couple of thousand analyses). And as far as I can tell it is working fine! A few remarks.

1. symbolsMap = y$genes[, c(1, 3)] needs to be changed to something like symbolsMap=row.names(featureData(eSet)@data). Apparently the vooma object does not include the "genes" slot (genes dataframe of gene annotation, only if counts was a DGEList object).

2. It does not seem to be doing multi-threading very effectively (processor activity remains at more or less the same level). But I was using the "custom" gene sets option (one gene set at a time). So maybe the parallelization is done differently. But at least in my case it might be better to give the function just 1 thread, split the data into lists and to do the parallelization with the Biocparallel "bplapply".

3. The results object is very complicated. I quite like the "biobroom" Bioconductor package that does "tidy" data frames from limma results objects. I wrote similar routines for my analysis. Egsea results are at: gsa@results$custom$test.results[[c]] where the c is a contrast (need to lapply/do.call over all of the contrasts) and then do the same for the individual methods which are at: gsa@results$custom$base.results[[c]]$ora (for the ora method).

4. The visualization routine takes enormous amounts of time to run and if you have e.g., 9 contrasts in your dataset which generates a huge number of combinations. But I suppose it is useful for smaller analyses.

5. I wonder if some of the methods (like GSVA and ROAST) should be run in the "absolute i.e., mixed" or the "directional" mode. But I suppose if one cares about that then it is possible to customize the "ensemble" method accordingly.

6. Some methods like the GSVA require at least 10 samples to be effective (don't know if applies to others).

All in all a very useful package! Both for automating the running of lots of methods at the same time and of course for the "ensemble" method.

ADD REPLY • link 8.2 years ago Pekka Kohonen ▴ 190

0

Entering edit mode

Hi Pekka,

Thank you very much for your valuable feedback! This should help many EGSEA users. I will revisit your suggestions soon and update the package accordingly.

Cheers,

Monther

ADD REPLY • link 8.2 years ago Monther Alhamdoosh ▴ 40

0

Entering edit mode

Hi Pekka, I also try EGSEA with microarray data. However, I got this error. Would you please have a suggestion?

gsa = egsea.ma(numeric_matrix, vector_group, probe_annotation, contrasts = contrast_matrix, gs.annots = gs.annots, baseGSEAs = baseMethods, sort.by = "avg.rank", num.threads = 4, report = FALSE)
Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

ADD REPLY • link 11 months ago Chris ▴ 20