edgeR : results from goana and kegga functions show opposite sig. p-value
1
0
Entering edit mode
@sharvari-gujja-6614
Last seen 22 months ago
United States

Hi,

I am running DE and FE analysis using edgeR for matched samples at two time points. The issue I have is interpreting the output from GO and KEGG analysis. For ex: a pathway from GO has a sig. P.Up value, while the same pathway in KEGG shows as sig. P.Down value:

    Term    Ont N   Up  Down    **P.Up**    P.Down
GO:0050853  B cell receptor signaling pathway   BP  104 66  31  **1.73E-06**    0.988896595
    Pathway N   Up  Down    P.Up    **P.Down**
path:hsa04662   B cell receptor signaling pathway   72  14  48  0.999960966 **4.49E-06**

Can you please let me know if I am missing something.

Please see the code below for your reference.


Time <- as.factor(Samples$group)
y <- DGEList(counts=Counts, group=Time,genes=as.data.frame(row.names(Counts)))
y$genes$entrezid <- mapIds(org.Hs.eg.db, y$genes$`row.names(Counts)`, keytype="SYMBOL", column="ENTREZID")

y$samples$Patient <-  Samples$patient 

# drop without gene symbols
y <- y[!is.na(y$genes$entrezid), ]

# design matrix
design <- model.matrix(~0+Patient+group, data=y$samples)

# filter out lowly expressed genes 
keep <- filterByExpr(y,group=y$samples$group)
print(table(keep))
y <- y[keep,,keep.lib.sizes=FALSE]

# calculate normalized factors
y <- calcNormFactors(y)

# estimate dispersion
y <- estimateDisp(y,design, robust=TRUE)

#To perform quasi-likelihood F-tests:
fit <- glmQLFit(y,design,robust=TRUE)

ql <- glmQLFTest(fit)

# GO analysis
go <- goana(ql,geneid = "entrezid",species="Hs")
go_out <- topGO(go,n=Inf)

# KEGG analysis
keg <- kegga(ql,geneid = "entrezid",species="Hs")
kegg_out <- topKEGG(keg, n=Inf)

sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] matrixStats_0.62.0   umap_0.2.8.0         org.Hs.eg.db_3.14.0  AnnotationDbi_1.56.2 IRanges_2.28.0      
 [6] S4Vectors_0.32.4     Biobase_2.54.0       BiocGenerics_0.40.0  edgeR_3.36.0         limma_3.50.3        
[11] stringi_1.7.6        data.table_1.14.2    forcats_0.5.1        stringr_1.4.0        dplyr_1.0.8         
[16] purrr_0.3.4          readr_2.1.2          tidyr_1.2.0          tibble_3.1.6         ggplot2_3.3.5       
[21] tidyverse_1.3.1      magick_2.7.3         tesseract_5.0.0     

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.8.0        bit64_4.0.5           
 [5] httr_1.4.2             GenomeInfoDb_1.30.1    tools_4.1.3            backports_1.4.1       
 [9] utf8_1.2.2             R6_2.5.1               DBI_1.1.2              colorspace_2.0-3      
[13] withr_2.5.0            tidyselect_1.1.2       bit_4.0.4              curl_4.3.2            
[17] compiler_4.1.3         textshaping_0.3.6      cli_3.3.0              rvest_1.0.2           
[21] xml2_1.3.3             labeling_0.4.2         scales_1.2.0           askpass_1.1           
[25] rappdirs_0.3.3         systemfonts_1.0.4      digest_0.6.29          XVector_0.34.0        
[29] pkgconfig_2.0.3        dbplyr_2.1.1           fastmap_1.1.0          rlang_1.0.2           
[33] readxl_1.4.0           rstudioapi_0.13        RSQLite_2.2.12         farver_2.1.0          
[37] generics_0.1.2         jsonlite_1.8.0         RCurl_1.98-1.6         magrittr_2.0.3        
[41] GO.db_3.14.0           GenomeInfoDbData_1.2.7 Matrix_1.4-1           Rcpp_1.0.8.3          
[45] munsell_0.5.0          fansi_1.0.3            reticulate_1.24        lifecycle_1.0.1       
[49] zlibbioc_1.40.0        grid_4.1.3             blob_1.2.3             crayon_1.5.1          
[53] lattice_0.20-45        Biostrings_2.62.0      haven_2.5.0            splines_4.1.3         
[57] hms_1.1.1              KEGGREST_1.34.0        locfit_1.5-9.5         pillar_1.7.0          
[61] reprex_2.0.1           glue_1.6.2             modelr_0.1.8           png_0.1-7             
[65] vctrs_0.4.1            tzdb_0.3.0             cellranger_1.1.0       gtable_0.3.0          
[69] openssl_2.0.0          assertthat_0.2.1       cachem_1.0.6           broom_0.8.0           
[73] RSpectra_0.16-0        ragg_1.2.2             memoise_2.0.1          statmod_1.4.36        
[77] ellipsis_0.3.2

Thanks Sharvari

edgeR • 1.2k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

It is not a valid assumption that a GO term and a KEGG term that appear to be the same are actually the same.

> library(org.Hs.eg.db)
## Get GO terms for B-cell pathway
> gns.GO <- mapIds(org.Hs.eg.db, "GO:0050853", "ENTREZID","GOALL", multiVals = "list")[[1]]
'select()' returned 1:many mapping between keys and columns
## remove any dups
> gns.GO <- unique(gns.GO)

## Now get KEGG genes
> gns.KEGG <- read.table("http://rest.kegg.jp/link/pathway/hsa")
> gns.KEGG <- split(gns.KEGG[,1], gns.KEGG[,2])[["path:hsa04662"]]
> gns.KEGG <- gsub("hsa:", "", gns.KEGG)
## remove any dups
> gns.KEGG <- unique(gns.KEGG)


## how many genes are in both groups?
> length(intersect(gns.GO, gns.KEGG))
[1] 17

## how many are in the GO term, but not KEGG?
> length(setdiff(gns.GO, gns.KEGG))
[1] 114

## How many are in the KEGG term but not GO?
> length(setdiff(gns.KEGG, gns.GO))
[1] 65

Only about 20% of the KEGG genes are in the GO term, and about 13% of the GO genes are in the KEGG pathway.

ADD COMMENT
0
Entering edit mode

Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6