I am working on plant dataset which is a close relative of Arabidopsis.
I performed differential expression analysis using an edgeR and found significant DE genes. I would like to identify associated GO terms.
I have a transcript as a Scaffold, and not the ENTREZ ID. I am just wondering whether it is possible to perform GO annotation and gene set testing in R for plant dataset.
If not, could you please suggests a different tool.
All of the standard GO-based analysis pipelines assume that there is an existing relationship between the features (e.g., genes, transcripts) and the GO terms. If you have this information in your dataset, then it's easy; just define all transcripts belonging to a single GO term as a gene set, and then use standard methods like roast, camera, etc. for gene set overrepresentation/enrichment testing. If you don't have the existing GO relationships... well, it gets a lot harder. I would probably suggest finding the homologous gene in Arabidopsis for each of your transcripts, and using the GO annotation of the homologs to define your gene sets. I rarely work with Arabidopsis, but I would assume that it's been studied thoroughly enough to have good GO annotation.
I estimated homologous gene and GO annotation for each of the transcript using Arabidopsis genome. I am not clear with defining all transcripts belonging to a single GO term as a gene set. Could you please send me an example file and R code for roast and camera?
I would have thought it was fairly self-explanatory. For each GO term, find all Arabidopsis genes annotated with that term; then, find the transcripts in your species that are homologous to those genes. The set of homologs constitutes the gene set corresponding to that GO term in your species. If that's not clear enough for you, then perhaps you need to find a local bioinformatician to help you out. This site isn't the place to get wholesale code for your analysis - well, not for free, anyway.
I would have thought it was fairly self-explanatory. For each GO term, find all Arabidopsis genes annotated with that term; then, find the transcripts in your species that are homologous to those genes. The set of homologs constitutes the gene set corresponding to that GO term in your species. If that's not clear enough for you, then perhaps you need to find a local bioinformatician to help you out. This site isn't the place to get wholesale code for your analysis - well, not for free, anyway.