Question

yet another gene universe question

0

Entering edit mode

Max Kuhn ▴ 60

@max-kuhn-2554

Last seen 10.2 years ago

United States

I have access to gene sets from 19 different databases (including GO and KEGG). Some of these sets are highly curated collections for one specific biological area (such as metabolism) while others are larger (~6K gene sets). The distribution of gene sets per database is: > stem(tbl) The decimal point is 3 digit(s) to the right of the | 0 | 01122333446688925 2 | 4 4 | 6 | 3 Appropriately defining the universe is critical, as people on this list have previously demonstrated. Does anyone have an opinion about how to define the gene universe when: 1) the genes include in all the gene sets is small (say 20% of the total number of genes). 2) only specific gene sets across databases are tested at once. For example, someone might want to get all the gene sets for a specific area (say cell cycle) across the different databases and test those at once I've been thinking that the universe aught to be the set of genes that are available across all the gene sets being tested. In case 1 above, this seems too small while in case 2 it seems excessively large (cue the Goldilocks jokes). Thanks, Max

• 630 views

ADD COMMENT • link 14.2 years ago Max Kuhn ▴ 60