Entering edit mode
Elizabeth Purdom
▴
210
@elizabeth-purdom-2486
Last seen 3.0 years ago
USA/ Berkeley/UC Berkeley
Hello,
I would like to get the range of genes from a TranscriptDb object.
Ideally, I would like to do:
txdb <- loadFeatures(system.file("extdata",
"UCSC_knownGene_sample.sqlite",
package="GenomicFeatures"))
genes(txdb)
And have this return the entire genomic range of the gene (much like
transcripts() returns the genomic range of the transcript, including
the
introns). However, this does not exist. So I have made a little
function
to do this:
genes<-function(txdb){
txByGene<-transcriptsBy(txdb,"gene")
geneStarts<-sapply(start(txByGene),min)
geneEnds<-sapply(end(txByGene),max)
geneChr<-sapply(seqnames(txByGene),unique)
geneStrand<-sapply(strand(txByGene),unique)
GRanges(seqnames=geneChr,strand=geneStrand,ranges=IRanges(geneStarts,g
eneEnds),gene_name=names(txByGene))
}
However, this has run me into all kinds of problems because the same
gene name can be used for transcripts on different chromosomes, namely
the X and Y chromosome (which makes me wonder about the safety of the
assumption that gene names are unique -- does this count as unique?).
Before I go further down this route, is there not a simple way to get
this information?
Thanks,
Elizabeth
--
Elizabeth Purdom
Assistant Professor
Department of Statistics
UC, Berkeley
Office: 433 Evans Hall
(510) 642-6154 (office)
(510) 642-7892 (fax)
Mailing: 367 Evans Hall Berkeley, CA 94720-3860
epurdom at stat.berkeley.edu