GOstats - zebrafish

0

Entering edit mode

Neel Aluru ▴ 460

@neel-aluru-3760

Last seen 8.4 years ago

United States

Hello BioC users, My question is pretty vague, so please bear with me. I want to do Gene set enrichment analysis (GSEA) on zebrafish agilent array data. I read the user guide and vignettes but still it is not quite clear to me how to proceed with it. What I have with regard to annotation of my probes is gene name, gene bank accession numbers and probe sequences. Using this information, is it possible to get GO annotations for my probes with any of the packages available. Zebrafish annotation package (org.Dr.eg.db) has objects such as org.Dr.egGO2EG, org.Dr.egGO etc. Is it possible to map with them and then you those data in GSEA? Most of the examples are on affymetrix data and I cannot seem to find literature where it is used on agilent arrays. Any suggestions or comments will be appreciated. Thank you, Neel Neel Aluru Postdoctoral Scholar Biology Department Woods Hole Oceanographic Institution Woods Hole, MA 02543 USA 508-289-3607

Annotation GO zebrafish probe Annotation GO zebrafish probe • 1.2k views

ADD COMMENT • link updated 15.0 years ago by James W. MacDonald 68k • written 15.0 years ago by Neel Aluru ▴ 460

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

Hi Neel, Neel Aluru wrote: > Hello BioC users, > > My question is pretty vague, so please bear with me. I want to do > Gene set enrichment analysis (GSEA) on zebrafish agilent array data. > I read the user guide and vignettes but still it is not quite clear > to me how to proceed with it. What I have with regard to annotation > of my probes is gene name, gene bank accession numbers and probe > sequences. Using this information, is it possible to get GO > annotations for my probes with any of the packages available. > Zebrafish annotation package (org.Dr.eg.db) has objects such as > org.Dr.egGO2EG, org.Dr.egGO etc. Is it possible to map with them and > then you those data in GSEA? Most of the examples are on affymetrix > data and I cannot seem to find literature where it is used on agilent > arrays. It's pretty simple, actually. You just need two things; a set of Entrez Gene IDs that represent the set of genes that you are calling differentially expressed, and a set of Entrez Gene IDs that represents the unique set of genes that the chip interrogates. Here I am assuming that you want to do a Fisher's exact test (which I guess is technically a GSEA, but not commonly called that). Note that the org.Dr.eg.db package is based on Entrez Gene IDs (e.g., the canonical mapping is from EG --> whatever). Since you have accession numbers, we want to use the org.Dr.egACCNUM table. In addition, we need to reverse the mappings to be EG <-- ACCNUM. Say your accession numbers are in a vector called 'accnum'. egs <- mget(accnum, revmap(org.Dr.egACCNUM), ifnotfound = NA) Two things here. The revmap() function just switches things so we find Entrez Gene IDs using accession numbers as input. We also say to give an NA if the Entrez Gene ID isn't found. This should be a many-to-one mapping, if I understand GenBank and Entrez, so e.g., a given accession number should just point to one Entrez Gene ID. However, it's nice to check. all(sapply(egs, length) == 1) Also, how many NAs are there? Assuming all lengths == 1, you can do sum(sapply(egs, is.na)) otherwise you need sum(sapply(egs, function(x) allis.na(x)))) If there are any duplicate Entrez Gene IDs there, you will have to decide which you want to use. If they are all length one, then egs <- unique(unlist(egs)) We unique-ify this vector because a given gene can only be differentially regulated once in a given sample. if there are NAs, then egs <- egs[!is.na(egs)] Now you need a vector of Entrez Gene IDs that represents all genes on the chip. I will assume you can get a vector of accession numbers for this as well. I further assume it is a character vector called 'univ'. We do the same rigamarole: univ <- mget(univ, revmap(org.Dr.egACCNUM), ifnotfound = NA) check that each list member is length one, if not, choose which EG ID you like, get rid of NAs, and make unique. Then you proceed just like in the vignette: p <- new("GOHyperGParams", geneIds = egs, universeGeneIds = univ, annotation = "org.Dr.eg.db", conditional = TRUE) #maybe other args hypt <- hyperGtest(p) summary(hypt) Best, Jim > > Any suggestions or comments will be appreciated. > > Thank you, > > Neel > > Neel Aluru Postdoctoral Scholar Biology Department Woods Hole > Oceanographic Institution Woods Hole, MA 02543 USA 508-289-3607 > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 15.0 years ago James W. MacDonald 68k

Login before adding your answer.