BSgenomes and protein sequences

0

Entering edit mode

Zybaylov, Boris L ▴ 30

@zybaylov-boris-l-5212

Last seen 10.1 years ago

Dear list, If I need to access all human transcripts I can use BSgenomes - but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? Thank you very much for you help! Dr. Boris Zybaylov Instructor Department of Biochemistry and Molecular Biology University of Arkansas Medical Sciences Little Rock, AR 1-501-686-7254 Confidentiality Notice: This e-mail message, including a...{{dropped:10}}

BSgenome BSgenome BSgenome BSgenome • 898 views

ADD COMMENT • link updated 12.4 years ago by Valerie Obenchain ★ 6.8k • written 12.4 years ago by Zybaylov, Boris L ▴ 30

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.7 years ago

United States

Hi Boris, You can accomplish this by extracting the coding regions from the BSgenome then translating the sequences. A similar example is on the extractTranscriptsFromGenome() man page. See ?extractTranscriptsFromGenome. (1) Create your own TranscriptDb object with one of makeTranscriptDb() makeTranscriptDbFromUCSC() makeTranscriptDbFromBiomart() or load an existing txdb library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene (2) Create a GRangesList of coding regions grouped by transcripts. The use of use.names=TRUE uses transcript names as labels instead of the internal transcript ids. cdsbytx <- cdsBy(txdb, "tx", use.names=TRUE) Extract the corresponding sequences from a BSgenome. Be sure to use a BSgenome that is compatible with the TranscriptDb (i.e., both hg19) : library(BSgenome.Hsapiens.UCSC.hg19) cds_seqs <- extractTranscriptsFromGenome(Hsapiens, cdsbytx) Sanity check : stopifnot(identical(unname(sapply(width(cdsbytx), sum)), width(cds_seqs))) (3) translate : All BSgenome objects store the"+" strand only. The extractTRanscriptsFromGenome() functions is strand-aware and takes care of reverse complementing sequences on the "-" strand so the sequences returned can be passed to the translate() function. Notice that you see the stop codon at the end of (almost) all sequences. aa <- translate(cds_seqs) Valerie On 05/01/2012 11:40 AM, Zybaylov, Boris L wrote: > Dear list, > > > > If I need to access all human transcripts I can use BSgenomes - > > but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this? > > > > Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way? > > > > Thank you very much for you help! > > > > Dr. Boris Zybaylov > > Instructor > Department of Biochemistry and Molecular Biology > University of Arkansas Medical Sciences > Little Rock, AR > 1-501-686-7254 > Confidentiality Notice: This e-mail message, including a...{{dropped:10}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.4 years ago Valerie Obenchain ★ 6.8k

Login before adding your answer.