Manual annotation of ExpressionSet object created from scratch
1
0
Entering edit mode
@michael-muratet-3076
Last seen 10.2 years ago
Greetings I have an ExpressionSet object that I created from scratch with expression data for features identified with Ensembl transcript IDs. The ExpressionSet constructor wants a character string for annotation data. Is there another way to populate the slot? From an AnnotatedDataFrame? Should I write a function that pulls in the data with biomaRt? Thanks Mike
• 1.5k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > Greetings > > I have an ExpressionSet object that I created from scratch with expression > data for features identified with Ensembl transcript IDs. The ExpressionSet > constructor wants a character string for annotation data. Is there another > way to populate the slot? From an AnnotatedDataFrame? Should I write a > function that pulls in the data with biomaRt? Hi, Mike. Perhaps you can show us what you mean. If you are talking about the annotation data slot, that is meant to be the string name of the annotation data package associated with the array. I guess that you do not have an annotation data package for the array, so you can leave out that slot when creating the ExpressionSet. If you have problems, it is best to post the code and, of course, your sessionInfo(). Sean
ADD COMMENT
0
Entering edit mode
On Oct 13, 2008, at 4:48 PM, Sean Davis wrote: > On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet > <mmuratet at="" hudsonalpha.org=""> wrote: >> Greetings >> >> I have an ExpressionSet object that I created from scratch with >> expression >> data for features identified with Ensembl transcript IDs. The >> ExpressionSet >> constructor wants a character string for annotation data. Is there >> another >> way to populate the slot? From an AnnotatedDataFrame? Should I >> write a >> function that pulls in the data with biomaRt? > > Hi, Mike. Perhaps you can show us what you mean. If you are talking > about the annotation data slot, that is meant to be the string name of > the annotation data package associated with the array. I guess that > you do not have an annotation data package for the array, so you can > leave out that slot when creating the ExpressionSet. If you have > problems, it is best to post the code and, of course, your > sessionInfo(). Sean Here's what I'm trying to do.... > library("Biobase") > exprMatrix <- as.matrix(read.table("exprset.txt", header=TRUE, > sep="\t", row.names=1, as.is=TRUE)) > pData <- read.table("phenoData.txt", row.names=1, header=TRUE, > sep="\t") > phenoData <- new("AnnotatedDataFrame", data=pData) > rnaseq_exprs <- new("ExpressionSet", exprs=exprMatrix, > phenoData=phenoData) > save(rnaseq_exprs, file="rnaseq_data.Robj") > > The data consists of RNAseq reads that I have mapped to Ensembl transcripts and normalized appropriately, e.g., SL265 SL264 SL266 SL310 SL312 SL313 ENST00000369829 0 0 0 0.00288159443768686 0.000696405393229021 0.000473063478950364 ENST00000393415 0 0 0 0.000428628056614047 0.000621528594887718 0.00047497519763826 So far this looks like a fairly useful way of looking at the data. I'd like to be able to use all the functionality I see in the docs for annotation of ExpressionSets. The ExpressionSet vignette talks about using an AnnotatedData frame but it doesn't really say where it goes. I haven't seen an annotation data package for Ensembl although I see how you might be able to create one with biomaRt. I'm looking for some expert advice so I don't go down any blind alleys. Thanks Mike > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Mon, Oct 13, 2008 at 6:00 PM, Michael Muratet <mmuratet at="" hudsonalpha.org=""> wrote: > > On Oct 13, 2008, at 4:48 PM, Sean Davis wrote: > >> On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet >> <mmuratet at="" hudsonalpha.org=""> wrote: >>> >>> Greetings >>> >>> I have an ExpressionSet object that I created from scratch with >>> expression >>> data for features identified with Ensembl transcript IDs. The >>> ExpressionSet >>> constructor wants a character string for annotation data. Is there >>> another >>> way to populate the slot? From an AnnotatedDataFrame? Should I write a >>> function that pulls in the data with biomaRt? >> >> Hi, Mike. Perhaps you can show us what you mean. If you are talking >> about the annotation data slot, that is meant to be the string name of >> the annotation data package associated with the array. I guess that >> you do not have an annotation data package for the array, so you can >> leave out that slot when creating the ExpressionSet. If you have >> problems, it is best to post the code and, of course, your >> sessionInfo(). > > Sean > > Here's what I'm trying to do.... > >> library("Biobase") >> exprMatrix <- as.matrix(read.table("exprset.txt", header=TRUE, sep="\t", >> row.names=1, as.is=TRUE)) >> pData <- read.table("phenoData.txt", row.names=1, header=TRUE, sep="\t") >> phenoData <- new("AnnotatedDataFrame", data=pData) >> rnaseq_exprs <- new("ExpressionSet", exprs=exprMatrix, >> phenoData=phenoData) >> save(rnaseq_exprs, file="rnaseq_data.Robj") >> >> > > The data consists of RNAseq reads that I have mapped to Ensembl transcripts > and normalized appropriately, e.g., > > SL265 SL264 SL266 SL310 SL312 SL313 > ENST00000369829 0 0 0 0.00288159443768686 > 0.000696405393229021 0.000473063478950364 > ENST00000393415 0 0 0 0.000428628056614047 > 0.000621528594887718 0.00047497519763826 > > So far this looks like a fairly useful way of looking at the data. > > I'd like to be able to use all the functionality I see in the docs for > annotation of ExpressionSets. The ExpressionSet vignette talks about using > an AnnotatedData frame but it doesn't really say where it goes. I haven't > seen an annotation data package for Ensembl although I see how you might be > able to create one with biomaRt. I'm looking for some expert advice so I > don't go down any blind alleys. For building annotation packages, see the AnnotationDbi package and the SQLForge vignette. See the Vignettes in Biobase for discussion of AnnotatedDataFrame. In short, though, an ExpressionSet contains two AnnotatedDataFrames, one for the sample information (the phenoData) and the other for the features on the array (the featureData). The featureData slot is often redundant if you build an annotation data package. However, you could use it to store a data frame of data from ensembl if you like. Sean
ADD REPLY
0
Entering edit mode
> > For building annotation packages, see the AnnotationDbi package and > the SQLForge vignette. See the Vignettes in Biobase for discussion of > AnnotatedDataFrame. In short, though, an ExpressionSet contains two > AnnotatedDataFrames, one for the sample information (the phenoData) > and the other for the features on the array (the featureData). The > featureData slot is often redundant if you build an annotation data > package. However, you could use it to store a data frame of data from > ensembl if you like. > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > Hi Michael, If you really want to make an annotation package where ensembl IDs are the main IDs to everything, then you are going to have to first make a mapping of the ensembl IDs to entrez gene IDs. This information is available for a lot of species already and so it can probably be found in the organism package that matches the critter you are working on (org.Hs.eg.db for human). Then you could use that mapping to make a custom annotation package where the ensembl IDs are basically presented as if they were the "probes". But the mappings in that case should be ok. However, I think its worth noting that unless you have a more complete ensembl to entrez ID mapping from another source, this is all just represents a reprocessing of the existing data that can already be found in the mapping of the appropriate organism package. Marc
ADD REPLY

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6