Dear Bioconductor community,
We have performed a differential gene expression analysis in an insect and identified some genes belonging to detoxification processes as differentially expressed. Now, I am trying to perform a gene set enrichment analysis based on PFAM domains as we want to see if some specific families related to detoxification (cytochromes, GST, etc..) are enriched in our dataset. We are using "Categories" and the "hyperg" function to do it. Do you suggest other type of analysis within "Categories" considering this objective?
I have some problems with the input files to perform a Hypergeometric (gene set enrichment) test. As far as I understand, I need three files:
- assayed - I included all gene ids (first column) with the corresponding pfam domain codes (second column and separated by ;)
- significant - IDs of differentially expressed genes
- universe - IDs of all genes
When I used the function:
result <- hyperg(assayed, sigsets, universe)
Appears the following error:
Error in .local(assayed, significant, universe, representation, ...) :
some 'assayed' genes not in 'universe'
As "assayed" and "universe" files were generated from the same file, I think that the problem would be that my "assayed" file has an incorrect format. What would be the correct format for the "assayed "file? I have tested PFAM domains separated by tab and it gives the same error.
Thanks in advance for your time and help,
Best wishes,
Jose
R session info: ``` R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=es_AR.UTF-8 LC_NUMERIC=C LC_TIME=es_AR.UTF-8 LC_COLLATE=es_AR.UTF-8 LC_MONETARY=es_AR.UTF-8
[6] LC_MESSAGES=es_AR.UTF-8 LC_PAPER=es_AR.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] Category_2.54.0 Matrix_1.3-3 AnnotationDbi_1.50.3 IRanges_2.24.1 S4Vectors_0.28.1 Biobase_2.50.0
[7] BiocGenerics_0.36.0 edgeR_3.30.3 limma_3.44.3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.6.1 compiler_4.0.5 bitops_1.0-7 tools_4.0.5 bit_4.0.4 tibble_3.1.2
[8] lifecycle_1.0.0 annotate_1.66.0 RSQLite_2.2.7 memoise_2.0.0 lattice_0.20-44 pkgconfig_2.0.3 rlang_0.4.11
[15] graph_1.66.0 DBI_1.1.1 fastmap_1.1.0 genefilter_1.70.0 hms_1.1.0 vctrs_0.3.8 locfit_1.5-9.4
[22] bit64_4.0.5 grid_4.0.5 GSEABase_1.50.1 R6_2.5.0 fansi_0.4.2 XML_3.99-0.6 RBGL_1.64.0
[29] survival_3.2-11 magrittr_2.0.1 readr_1.4.0 blob_1.2.1 ellipsis_0.3.2 splines_4.0.5 xtable_1.8-4
[36] utf8_1.2.1 RCurl_1.98-1.3 cachem_1.0.5 crayon_1.4.1
Thanks a lot James for your help and time to answer my question. Best wishes, Jose