Add gene information to most expressed gene list (packages lumi and GOstats)
1
0
Entering edit mode
@marongiuluigi-7134
Last seen 3.9 years ago
European Union

Dear all,
I am new to the use of microarray analysis; I have been looking at the use of lumi for the analysis of Illumina BeadArray chips. I have used the tutorial by Du, Feng, Kibbe and Lin (2010) (http://www.bioconductor.org/packages//2.7/bioc/vignettes/lumi/inst/doc/lumi.pdf) to learn the process. In that vignette, the authors use the packages lumi and limma to normalize the data and then fit the linear method (creating an object 'fit') that can identify the most expressed genes. The data come from the example.lumi dataset contained in the package lumiBarnes.
At the end of the analysis, the over-expressed genes have been identified with the hypergeometric test and the Illumina codes have been associated with the EntrezID and the Molecular Function; the latter is done with the function getGOTerm and only the term BP are used. The final object is a dataframe with the GO IDs and the molecular group of the genes.
Is it possible to add a column with the gene's names? Does getGOTerm give also other useful information, for instance, gene's location?
Thank you

###
library(lumi)
library(lumiBarnes)
library(limma)
library(GOstats)
library(lumiHumanAll.db)
library(annotate)
pval <- 0.01

data("example.lumi")
lumi_t <-lumiT(example.lumi)
lumi_n <- lumiN(lumi_t, method='rsn')
dataMatrix <- exprs(lumi_n)
probeList <- rownames(dataMatrix)
sampleType <- c('100:0', '95:5', '100:0', '95:5') 
design <- model.matrix(~ factor(sampleType))
colnames(design) <- c('100:0', '95:5-100:0')
fit <- lmFit(dataMatrix, design)
fit <- eBayes(fit)
sigGene <- probeList[fit$p.value[,2] < pval] 
## Convert the probe Ids as Entrez Ids and make them unique
sigLL <- unique(unlist(lookUp(sigGene,'lumiHumanAll.db','ENTREZID')))
sigLL <- as.character(sigLL[!is.na(sigLL)])
params <- new("GOHyperGParams",
              geneIds= sigLL,
              annotation="lumiHumanAll.db",
              ontology="BP",
              pvalueCutoff= pval,
              conditional=FALSE,
              testDirection="over")
# calculate expressed genes by hypergeometric test
hgOver <- hyperGTest(params) 
gGhyp.pv <- pvalues(hgOver)
gGhyp.fdr <- p.adjust(gGhyp.pv, 'fdr')
sigGO.ID <- names(gGhyp.fdr[gGhyp.fdr < pval])
sigGO.Term <- getGOTerm(sigGO.ID)[["BP"]]
exp.list <- data.frame(sigGO.ID, sigGO.Term, stringsAsFactors = FALSE)
rownames(exp.list) <- NULL
annotation lumi GOstats • 1.4k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

A GO term isn't a gene; it's a term that tells you something about a set of genes that are similar in some sense. So when you do a GO test like that, you are asking 'is this set of genes over-represented in my set of significant genes?'. The data.frame you are constructing shows the significant GO terms, but each GO term is made up of multiple significant genes (and probably a bunch of genes that don't have significant p-values).

You could ask a question like 'what genes in my experiment were significant, and gave rise to this significant GO term?', in which case you could use makeGoTable from my affycoretools package, doing something like

tab <- topTable(fit, 2, p.value = pval)

gotab <- summary(hgOver)

prbs <- probeSetSummary(hgOver)

htab <- makeGoTable(tab, gotab, prbs, "95:5-100:0", affy = FALSE)

will generate an HTML table in a subdirectory called 'GO_results' that has all the GO terms and other statistics for the GO test, and the GO terms themselves will be links to another table that shows the genes that gave rise to those results.

If you want extra information in the second table, you can add things like the gene location or whatever to the featureData slot of your ExpressionSet and that will all end up in those tables.

Note that this function is actually intended to be used as part of an Rmarkdown  type file that contains both code and the analysis description. In which case the 'htab' object that we made is a link that you can put in a table or in the text so you can click on it and have the resulting table come up.

I should also note that the ReportingTools package (which affycoretools uses extensively) has functionality to do something very similar, and more automatically than affycoretools. See the vignette for more information.

ADD COMMENT

Login before adding your answer.

Traffic: 1091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6