Finding TSS locations
1
0
Entering edit mode
Tom Bartlett ▴ 60
@tom-bartlett-5059
Last seen 10.2 years ago
Hi, I'm trying to find a way to get the locations of the tss (transcriptional start site) for genes (I need this for work analysising Illumina 450K methylation data). I've tried the package GenomicFeatures, and have successfully downloaded and loaded the package, however the relevant command data(geneHuman) doesn't seem to work, producing the following error message: Warning message: In data(geneHuman) : data set ?geneHuman? not found I'm currently using R 2.15 on Windows Vista (I also have access to a Unix-type machine) thanks in advance for your help Tom Bartlett
• 5.6k views
ADD COMMENT
1
Entering edit mode
@moiz-bootwalla-5215
Last seen 9.7 years ago
United States
Hi Thomas, The following code snippet gets the tss for all genes based on the UCSC refGene table using GenomicFeatures. refgene <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") transcripts <- transcripts(refgene, columns=c("tx_id", "tx_name")) tss <- resize(transcripts, width=1, fix='start') You should probably save the TranscriptDB object using saveFeatures() so that you do not have to recreate it the next time you need it. Refer to the GenomicFeatures vignette on how to do that. Hope this helps. Moiz On Jun 27, 2012, at 1:26 AM, Bartlett, Thomas wrote: > Hi, > > I'm trying to find a way to get the locations of the tss (transcriptional start site) for genes (I need this for work analysising Illumina 450K methylation data). I've tried the package GenomicFeatures, and have successfully downloaded and loaded the package, however the relevant command data(geneHuman) doesn't seem to work, producing the following error message: > Warning message: > In data(geneHuman) : data set ‘geneHuman’ not found > > I'm currently using R 2.15 on Windows Vista (I also have access to a Unix-type machine) > > thanks in advance for your help > > Tom Bartlett > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Moiz, Thanks again for your help with this, and sorry if this is a stupid question(!), but how do I convert the tss S4 object into, say, a named vector, or a 2-column matrix, which links e.g., Entrez ID to TSS location of the corresponding gene? best wishes Tom ________________________________ From: Moiz Bootwalla [msbootwalla@gmail.com] Sent: 27 June 2012 09:49 To: Bartlett, Thomas Cc: bioconductor@r-project.org Subject: Re: [BioC] Finding TSS locations Hi Thomas, The following code snippet gets the tss for all genes based on the UCSC refGene table using GenomicFeatures. refgene <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") transcripts <- transcripts(refgene, columns=c("tx_id", "tx_name")) tss <- resize(transcripts, width=1, fix='start') You should probably save the TranscriptDB object using saveFeatures() so that you do not have to recreate it the next time you need it. Refer to the GenomicFeatures vignette on how to do that. Hope this helps. Moiz On Jun 27, 2012, at 1:26 AM, Bartlett, Thomas wrote: Hi, I'm trying to find a way to get the locations of the tss (transcriptional start site) for genes (I need this for work analysising Illumina 450K methylation data). I've tried the package GenomicFeatures, and have successfully downloaded and loaded the package, however the relevant command data(geneHuman) doesn't seem to work, producing the following error message: Warning message: In data(geneHuman) : data set ‘geneHuman’ not found I'm currently using R 2.15 on Windows Vista (I also have access to a Unix-type machine) thanks in advance for your help Tom Bartlett _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org<mailto:bioconductor@r-project.org> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Tom, Continuing with the previous code snippet, here's how you can create a named vector of TSS locations, where the names are the Entrez IDs. tss.vector <- start(tss) names(tss.vector) <- values(tss)$tx_id That should do it. If you plan on working more with GRanges objects you should spend some time working through the GenomicRanges vignette. It will prove to be a great investment. Regards, Moiz On Jun 27, 2012, at 6:51 AM, Bartlett, Thomas wrote: > Hi Moiz, > > Thanks again for your help with this, and sorry if this is a stupid question(!), but how do I convert the tss S4 object into, say, a named vector, or a 2-column matrix, which links e.g., Entrez ID to TSS location of the corresponding gene? > > best wishes > > Tom > > From: Moiz Bootwalla [msbootwalla@gmail.com] > Sent: 27 June 2012 09:49 > To: Bartlett, Thomas > Cc: bioconductor@r-project.org > Subject: Re: [BioC] Finding TSS locations > > Hi Thomas, > > The following code snippet gets the tss for all genes based on the UCSC refGene table using GenomicFeatures. > > refgene <- makeTranscriptDbFromUCSC(genome="hg19", tablename="refGene") > transcripts <- transcripts(refgene, columns=c("tx_id", "tx_name")) > tss <- resize(transcripts, width=1, fix='start') > > You should probably save the TranscriptDB object using saveFeatures() so that you do not have to recreate it the next time you need it. Refer to the GenomicFeatures vignette on how to do that. > > Hope this helps. > > Moiz > > > > On Jun 27, 2012, at 1:26 AM, Bartlett, Thomas wrote: > >> Hi, >> >> I'm trying to find a way to get the locations of the tss (transcriptional start site) for genes (I need this for work analysising Illumina 450K methylation data). I've tried the package GenomicFeatures, and have successfully downloaded and loaded the package, however the relevant command data(geneHuman) doesn't seem to work, producing the following error message: >> Warning message: >> In data(geneHuman) : data set ‘geneHuman’ not found >> >> I'm currently using R 2.15 on Windows Vista (I also have access to a Unix-type machine) >> >> thanks in advance for your help >> >> Tom Bartlett >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6