R Package for retrieving sequence features
1
0
Entering edit mode
Wuming Gong ▴ 40
@wuming-gong-1261
Last seen 10.2 years ago
Dear list, I want to retrieve the sequence features, such as start/end position of UTR and CDS, according to given genbank accession numbers. I found that several R functions have the ability to retrieve the sequences alone from genbank, such as getSEQ() from annotate package, read.GenBank() from apt package (not in Bioconductor) and seqNCBI() from GeneR package, but none of them could retrieve the information on sequence features. Is there any R package that can retrieve the sequence features (just like get_SeqFeatures() function in BioPerl)? Thanks, Wuming
annotate GeneR annotate GeneR • 2.5k views
ADD COMMENT
0
Entering edit mode
Charles Berry ▴ 290
@charles-berry-5754
Last seen 5.7 years ago
United States
On Thu, 23 Mar 2006, Wuming Gong wrote: > Dear list, > > I want to retrieve the sequence features, such as start/end position > of UTR and CDS, according to given genbank accession numbers. I found > that several R functions have the ability to retrieve the sequences > alone from genbank, such as getSEQ() from annotate package, > read.GenBank() from apt package (not in Bioconductor) and seqNCBI() > from GeneR package, but none of them could retrieve the information on > sequence features. > > Is there any R package that can retrieve the sequence features (just > like get_SeqFeatures() function in BioPerl)? I do not know of such a capability in package, but it is not hard to 'roll-your-own' using the file http://hgdownload.cse.ucsc.edu/goldenPath/hg17/database/genscan.txt.gz Like this: genePred.fmt <- list(name = "a", chrom = "a", strand = "a", txStart = 1, txEnd = 1, cdsStart = 1, cdsEnd = 1, exonCount = 1, exonStarts = "a", exonEnds = "a") genPred.dat <- scan( gzfile( file.path( my.path,"genscan.txt.gz" ), what = genePred.fmt) get.features <- function(x, y=genPred.dat) { indx <- match( x, y$name ) sapply( y, "[", indx ) } > get.features( c( "NT_077402.1", "NT_077402.4") ) name chrom strand txStart txEnd cdsStart cdsEnd exonCount [1,] "NT_077402.1" "chr1" "+" "2052" "4012" "2052" "4012" "3" [2,] "NT_077402.4" "chr1" "+" "121020" "124696" "121020" "124696" "5" exonStarts [1,] "2052,2475,3913," [2,] "121020,121450,122181,122997,124620," exonEnds [1,] "2090,2584,4012," [2,] "121200,121708,122244,123179,124696," > HTH, Chuck > > Thanks, > > Wuming > > > > > [ Part 3.9: "Included Message" ] > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717
ADD COMMENT

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6