BSgenomes and protein sequences
1
0
Entering edit mode
@zybaylov-boris-l-5212
Last seen 10.1 years ago
Dear list, If I need to access all human transcripts I can use BSgenomes - but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? Thank you very much for you help! Dr. Boris Zybaylov Instructor Department of Biochemistry and Molecular Biology University of Arkansas Medical Sciences Little Rock, AR 1-501-686-7254 Confidentiality Notice: This e-mail message, including a...{{dropped:10}}
BSgenome BSgenome BSgenome BSgenome • 898 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.7 years ago
United States
Hi Boris, You can accomplish this by extracting the coding regions from the BSgenome then translating the sequences. A similar example is on the extractTranscriptsFromGenome() man page. See ?extractTranscriptsFromGenome. (1) Create your own TranscriptDb object with one of makeTranscriptDb() makeTranscriptDbFromUCSC() makeTranscriptDbFromBiomart() or load an existing txdb library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene (2) Create a GRangesList of coding regions grouped by transcripts. The use of use.names=TRUE uses transcript names as labels instead of the internal transcript ids. cdsbytx <- cdsBy(txdb, "tx", use.names=TRUE) Extract the corresponding sequences from a BSgenome. Be sure to use a BSgenome that is compatible with the TranscriptDb (i.e., both hg19) : library(BSgenome.Hsapiens.UCSC.hg19) cds_seqs <- extractTranscriptsFromGenome(Hsapiens, cdsbytx) Sanity check : stopifnot(identical(unname(sapply(width(cdsbytx), sum)), width(cds_seqs))) (3) translate : All BSgenome objects store the"+" strand only. The extractTRanscriptsFromGenome() functions is strand-aware and takes care of reverse complementing sequences on the "-" strand so the sequences returned can be passed to the translate() function. Notice that you see the stop codon at the end of (almost) all sequences. aa <- translate(cds_seqs) Valerie On 05/01/2012 11:40 AM, Zybaylov, Boris L wrote: > Dear list, > > > > If I need to access all human transcripts I can use BSgenomes - > > but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? > > > > Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? > > > > Thank you very much for you help! > > > > Dr. Boris Zybaylov > > Instructor > Department of Biochemistry and Molecular Biology > University of Arkansas Medical Sciences > Little Rock, AR > 1-501-686-7254 > Confidentiality Notice: This e-mail message, including a...{{dropped:10}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6