Question

Cannot retrieve DE genes from goana GO terms

1

Entering edit mode

b.nota ▴ 370

@bnota-7379

Last seen 4.6 years ago

Netherlands

We have done a simple analysis in edgeR, to find significant DE genes between two groups, followed by GO analysis with goana. We tried to follow protocol/manual. After finding significant GO terms we would like to see which genes are in these GO terms. We found a post describing how to get the DE genes from the GO terms, but following Aaron's(goana limma- extract list of DE genes and genes in the enriched GO terms?) method, we were not successful. We are not sure what is going wrong, and whether this is a bug or whether we are missing something.

Some essential code here:

fit <- glmFit(y, design)

lrt <- glmLRT(fit)

summary(decideTests(lrt))
       tissuevein
Down          794
NotSig      13056
Up            102

### GO analysis

go <- goana(lrt)

go4 <- topGO(go, ont="BP", sort="Up", n=50)

head(go4)
                                                                         Term Ont N Up Down         P.Up P.Down
GO:0043252                         sodium-independent organic anion transport  BP 2  2    0 6.017011e-05      1
GO:0006270                                         DNA replication initiation  BP 8  2    0 1.635777e-03      1
GO:0071881 adenylate cyclase-inhibiting adrenergic receptor signaling pathway  BP 1  1    0 7.888509e-03      1
GO:0042891                                               antibiotic transport  BP 1  1    0 7.888509e-03      1
GO:0140074                      cardiac endothelial to mesenchymal transition  BP 1  1    0 7.888509e-03      1
GO:0071878       negative regulation of adrenergic receptor signaling pathway  BP 1  1    0 7.888509e-03      1

sig_toptags3_up <- subset(toptags3$table, FDR < 0.05 & logFC > 0)

head(sig_toptags3_up)
       genes Entrezgene    logFC   logCPM       LR       PValue          FDR
28703  PTGS1       5742 7.612927 9.502819 40.68602 1.787619e-10 6.235215e-07
13240    DCN       1634 6.643868 9.932415 36.77281 1.327288e-09 2.314790e-06
25397  NR2F2       7025 7.074398 8.899530 35.37534 2.719074e-09 2.918194e-06
20564   LHX6      26468 7.702296 7.428558 34.05959 5.344974e-09 4.386652e-06
10239 COL3A1       1281 5.889018 9.778522 31.89044 1.631177e-08 9.482577e-06
29575  RIMS1      22999 7.017122 7.891823 29.78443 4.828546e-08 2.041451e-05

## Slightly modified from Aaron's method

# Basic set-up:

DB <- paste("org", "Hs", "eg", "db", sep = ".")
require(DB, character.only = TRUE)
GO2ALLEGS <- paste("org", "Hs", "egGO2ALLEGS", sep = ".")
EG.GO <- AnnotationDbi::toTable(get(GO2ALLEGS))
d <- duplicated(EG.GO[, c("gene_id", "go_id", "Ontology")])
EG.GO <- EG.GO[!d, ]

de.by.go <- split(EG.GO$gene_id, EG.GO$go_id)
de.by.go <- lapply(de.by.go, FUN=function(x) { x[x %in% sig_toptags3_up$Entrezgene] })

de.by.go.stack <- stack(de.by.go)

table_sig_GO <- de.by.go.stack[de.by.go.stack$ind %in% rownames(go4),]

table_sig_GO2 <- merge(table_sig_GO, sig_toptags3_up[,1:2], by.x="values", by.y="Entrezgene",
all.x=T, all.y=F, sort=F)

GO4_terms <- go4[,1:2]
rownames(GO4_terms) <- rownames(go4)

table_sig_GO3 <- merge(table_sig_GO2, GO4_terms, by.x="ind", by.y="row.names",
all.x=T, all.y=F, sort=T)

dim(table_sig_GO3)
[1] 34  5

head(table_sig_GO3)
         ind values  genes                       Term Ont
1 GO:0000209  57161  PELI2 protein polyubiquitination  BP
2 GO:0000209  79176 FBXL15 protein polyubiquitination  BP
3 GO:0000209 246184  CDC26 protein polyubiquitination  BP
4 GO:0002250   1191    CLU   adaptive immune response  BP
5 GO:0002250   2185  PTK2B   adaptive immune response  BP
6 GO:0002250   3383  ICAM1   adaptive immune response  BP

The results show 34 rows (while there are 50 GO terms, and 73 DE genes in these according go4 results), also the GO terms that are present contain not the right number of genes. Sometimes less, but also sometimes more! Is this the only way to retrieve the right genes from goana? Or are we missing something? Any ideas about this?

limma goana edger • 1.7k views

ADD COMMENT • link updated 6.8 years ago by Aaron Lun ★ 28k • written 6.8 years ago by b.nota ▴ 370

0

Entering edit mode

I'm having a tough time reading your code. Can you distil it into a minimum working example?

ADD REPLY • link 6.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Hi Aaron, thanks for your reply. I modified your code in order to get only the significant GO terms in data.frame, and also with gene symbol and GO description. But to give a simple example, I use exactly your code from goana limma- extract list of DE genes and genes in the enriched GO terms?.

> summary(decideTests(lrt))
       tissuevein
Down          794
NotSig      13056
Up            102
>
> sig_toptags3_up <- subset(toptags3$table, FDR < 0.05 & logFC > 0)  
> dim(sig_toptags3_up)
[1] 102   7
> head(sig_toptags3_up)
       genes Entrezgene    logFC   logCPM       LR       PValue          FDR
28703  PTGS1       5742 7.612927 9.502819 40.68602 1.787619e-10 6.235215e-07
13240    DCN       1634 6.643868 9.932415 36.77281 1.327288e-09 2.314790e-06
25397  NR2F2       7025 7.074398 8.899530 35.37534 2.719074e-09 2.918194e-06
20564   LHX6      26468 7.702296 7.428558 34.05959 5.344974e-09 4.386652e-06
10239 COL3A1       1281 5.889018 9.778522 31.89044 1.631177e-08 9.482577e-06
29575  RIMS1      22999 7.017122 7.891823 29.78443 4.828546e-08 2.041451e-05
>
> go <- goana(lrt)
>
> go4 <- topGO(go, ont="BP", sort="Up", n=50)
> dim(go4)
[1] 50  7
> tail(go4)
                                                                                                                                Term
GO:0035108                                                                                                        limb morphogenesis
GO:0051321                                                                                                        meiotic cell cycle
GO:0002460 adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains
GO:0060666                                            dichotomous subdivision of terminal units involved in salivary gland branching
GO:0048730                                                                                                   epidermis morphogenesis
GO:0033604                                                                            negative regulation of catecholamine secretion
           Ont  N Up Down       P.Up    P.Down
GO:0035108  BP 29  2    1 0.02140434 0.8162343
GO:0051321  BP 30  2    1 0.02282192 0.8267032
GO:0002460  BP 79  3    3 0.02341188 0.8336310
GO:0060666  BP  3  1    0 0.02348546 1.0000000
GO:0048730  BP  3  1    0 0.02348546 1.0000000
GO:0033604  BP  3  1    0 0.02348546 1.0000000
>
> species <- "Hs"
> de <- sig_toptags3_up$Entrezgene
>
> # Basic set-up:
> DB <- paste("org", species, "eg", "db", sep = ".")
> require(DB, character.only = TRUE)
> GO2ALLEGS <- paste("org", species, "egGO2ALLEGS", sep = ".")
> EG.GO <- AnnotationDbi::toTable(get(GO2ALLEGS))
> d <- duplicated(EG.GO[, c("gene_id", "go_id", "Ontology")])
> EG.GO <- EG.GO[!d, ]
>
> # Now choosing DE genes for each GO ID/Ontology combination:
> de.by.go <- split(EG.GO$gene_id, paste(EG.GO$go_id, EG.GO$Ontology, sep="."))
> de.by.go <- lapply(de.by.go, FUN=function(x) { x[x %in% de] })
>
> de.by.go[["GO:0051321.BP"]]
character(0)
> de.by.go[["GO:0035108.BP"]]
[1] "4041" "5087" "8854"
> de.by.go[["GO:0002460.BP"]]
[1] "1191" "3383" "5590"

In this example GO:0051321 should have 2 significant genes, but none are found, GO:0035108 reports 3 genes (but only 2 are expected), and GO:0002460 has 3 genes just like the topGO results indicate. Can you follow this code now?

ADD REPLY • link 6.8 years ago b.nota ▴ 370

score 2 · Accepted Answer · 2018-07-20

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 18 hours ago

The city by the bay

goana is probably assuming that your row names are the Entrez IDs. By the looks of it, they're not, hence the discrepancy. Set geneid in your goana call to make sure it uses the correct IDs.

ADD COMMENT • link 6.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Yes, it was indeed. Because the rownames were also numbers no error occurred... Many thanks (again), we get other and correct results now.

ADD REPLY • link 6.8 years ago b.nota ▴ 370