BSgenomes and protein sequences
Entering edit mode
Last seen 10.1 years ago
Dear list, If I need to access all human transcripts I can use BSgenomes - but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? Thank you very much for you help! Dr. Boris Zybaylov Instructor Department of Biochemistry and Molecular Biology University of Arkansas Medical Sciences Little Rock, AR 1-501-686-7254 Confidentiality Notice: This e-mail message, including a...{{dropped:10}}
BSgenome BSgenome BSgenome BSgenome • 898 views
Entering edit mode
Last seen 2.7 years ago
United States
Hi Boris, You can accomplish this by extracting the coding regions from the BSgenome then translating the sequences. A similar example is on the extractTranscriptsFromGenome() man page. See ?extractTranscriptsFromGenome. (1) Create your own TranscriptDb object with one of makeTranscriptDb() makeTranscriptDbFromUCSC() makeTranscriptDbFromBiomart() or load an existing txdb library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene (2) Create a GRangesList of coding regions grouped by transcripts. The use of use.names=TRUE uses transcript names as labels instead of the internal transcript ids. cdsbytx <- cdsBy(txdb, "tx", use.names=TRUE) Extract the corresponding sequences from a BSgenome. Be sure to use a BSgenome that is compatible with the TranscriptDb (i.e., both hg19) : library(BSgenome.Hsapiens.UCSC.hg19) cds_seqs <- extractTranscriptsFromGenome(Hsapiens, cdsbytx) Sanity check : stopifnot(identical(unname(sapply(width(cdsbytx), sum)), width(cds_seqs))) (3) translate : All BSgenome objects store the"+" strand only. The extractTRanscriptsFromGenome() functions is strand-aware and takes care of reverse complementing sequences on the "-" strand so the sequences returned can be passed to the translate() function. Notice that you see the stop codon at the end of (almost) all sequences. aa <- translate(cds_seqs) Valerie On 05/01/2012 11:40 AM, Zybaylov, Boris L wrote: > Dear list, > > > > If I need to access all human transcripts I can use BSgenomes - > > but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? > > > > Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? > > > > Thank you very much for you help! > > > > Dr. Boris Zybaylov > > Instructor > Department of Biochemistry and Molecular Biology > University of Arkansas Medical Sciences > Little Rock, AR > 1-501-686-7254 > Confidentiality Notice: This e-mail message, including a...{{dropped:10}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at > > Search the archives:

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6