Positional Details with Features through UniProt.ws Ultimately to display as tracks in ggbio
1
0
Entering edit mode
@anne-deslattes-mays-5977
Last seen 10.2 years ago
United States
Dear all, biocLite(?UniProt.ws?) libraryUniProt.ws) selectUniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),key type="UNIPROTKB") Getting extra data for P02794 NA NA etc UNIPROTKB DOMAINS 1 P02794 Ferritin-like diiron domain (1) FEATURES 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator methionine (1); Metal binding (6); Modified residue (4); Sequence conflict (1); Turn (2) What I want are the positional details for each of these features ? which are visible through the uniprot web page. FTH1 is 183 amino acids in length. There are 6 metal binding sites, each at a specific position. This information is there since you can have the web site return the positional details. I would like them so I may manipulate them with new evidential information. Ultimately I wish to display them with tracks from ggbio ? pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile, param = ScanBamParam(which = genesymbol["FTH1"],what=c("seq")), use.names = TRUE) FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"]) So here I have sample information which I have aligned to the reference genome. I retrieve that information from a bam file. # create the GAlignments objects for each isoform FTH1.isoform.1 <- pb.53A.pos.ga[c(7)] FTH1.isoform.2 <- pb.53A.pos.ga[c(15)] FTH1.isoform.3 <- pb.53A.pos.ga[c(13)] FTH1.isoform.4 <- pb.53A.pos.ga[c(8)] FTH1.isoform.5 <- pb.53A.pos.ga[c(2)] FTH1.isoform.6 <- pb.53A.pos.ga[c(1)] p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown") p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue") p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown") p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown") p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown") p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown") tracks( FTH1=p1.FTH1, "Iso 1"=p1, "Iso 2"=p2, "Iso 3"=p3, "Iso 4"=p4, "Iso 5"=p5, "Iso 6"=p6) I then can autopilot each of the separate isoforms. What I want to do however, is annotate the isoforms so that they each show the coding region with the full height of the bar, and a reduced height for the non-coding regions. Additionally, I want to color the graphic with the details for the protein, such as the metal binding sites, domains, etc. So that computationally I can generate an informative picture which explains what is lost or gained in separate isoforms. Thoughts? Anne R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] UniProt.ws_2.4.2 [2] RCurl_1.95-4.3 [3] bitops_1.0-6 [4] RSQLite_0.11.4 [5] DBI_0.2-7 [6] biomaRt_2.20.0 [7] BiocInstaller_1.14.2 [8] GenomicAlignments_1.0.5 [9] BSgenome_1.32.0 [10] Rsamtools_1.16.1 [11] Biostrings_2.32.1 [12] XVector_0.4.0 [13] ggbio_1.12.8 [14] ggplot2_1.0.0 [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 [16] GenomicFeatures_1.16.2 [17] AnnotationDbi_1.26.0 [18] Biobase_2.24.0 [19] GenomicRanges_1.16.4 [20] GenomeInfoDb_1.0.2 [21] IRanges_1.22.10 [22] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1 [4] biovizBase_1.12.1 brew_1.0-6 checkmate_1.2 [7] cluster_1.15.2 codetools_0.2-8 colorspace_1.2-4 [10] dichromat_2.0-0 digest_0.6.4 fail_1.2 [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0 [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4 [19] iterators_1.0.7 labeling_0.2 lattice_0.20-29 [22] latticeExtra_0.6-26 MASS_7.3-33 munsell_0.4.2 [25] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 [28] Rcpp_0.11.2 reshape2_1.4 rtracklayer_1.24.2 [31] scales_0.2.4 sendmailR_1.1-2 splines_3.1.0 [34] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 [37] tcltk_3.1.0 tools_3.1.0 VariantAnnotation_1.10.5 [40] XML_3.98-1.1 zlibbioc_1.10.0 [[alternative HTML version deleted]]
annotate ggbio annotate ggbio • 1.6k views
ADD COMMENT
0
Entering edit mode
Tengfei Yin ▴ 490
@tengfei-yin-6162
Last seen 10.2 years ago
Hey Anne, So sorry for the late reply. Ideally, I should have some kind of mapper function in biovizBase to help map protein space to genomic space, so you don't have to do it yourself, but before I have that, a hack would be massage your protein domain data into a GRanges object, with domain function as coloumn, and use genomic coordinates, and then create a separate track to plot the object as rectangle and use color legend to indicate domain function. I will try to develop a more general approach for doing this, if you want, please send me an example RData or example data, so we can work on that together. ps: in case I don't miss your request, feel free to use github page issues <https: github.com="" tengfei="" ggbio="" issues="">here cheers Tengfei On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at="" georgetown.edu=""> wrote: > Dear all, > > biocLite(?UniProt.ws?) > libraryUniProt.ws) > > > selectUniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),k eytype="UNIPROTKB") > Getting extra data for P02794 NA NA etc > UNIPROTKB DOMAINS > 1 P02794 Ferritin-like diiron domain (1) > > > FEATURES > 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator > methionine (1); Metal binding (6); Modified residue (4); Sequence conflict > (1); Turn (2) > > What I want are the positional details for each of these features ? which > are visible through the uniprot web page. > FTH1 is 183 amino acids in length. There are 6 metal binding sites, each > at a specific position. > This information is there since you can have the web site return the > positional details. I would like them so I may manipulate them with new > evidential information. > > Ultimately I wish to display them with tracks from ggbio ? > pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile, > param = ScanBamParam(which = > genesymbol["FTH1"],what=c("seq")), > use.names = TRUE) > > FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"]) > > So here I have sample information which I have aligned to the reference > genome. I retrieve that information from a bam file. > # create the GAlignments objects for each isoform > FTH1.isoform.1 <- pb.53A.pos.ga[c(7)] > FTH1.isoform.2 <- pb.53A.pos.ga[c(15)] > FTH1.isoform.3 <- pb.53A.pos.ga[c(13)] > FTH1.isoform.4 <- pb.53A.pos.ga[c(8)] > FTH1.isoform.5 <- pb.53A.pos.ga[c(2)] > FTH1.isoform.6 <- pb.53A.pos.ga[c(1)] > > > p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown") > p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue") > p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown") > p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown") > p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown") > p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown") > > tracks( FTH1=p1.FTH1, > "Iso 1"=p1, > "Iso 2"=p2, > "Iso 3"=p3, > "Iso 4"=p4, > "Iso 5"=p5, > "Iso 6"=p6) > > > I then can autopilot each of the separate isoforms. What I want to do > however, is annotate the isoforms so that they each show the coding region > with the full height of the bar, and a reduced height for the non- coding > regions. > > Additionally, I want to color the graphic with the details for the > protein, such as the metal binding sites, domains, etc. So that > computationally I can generate an informative picture which explains what > is lost or gained in separate isoforms. > > Thoughts? > > Anne > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] UniProt.ws_2.4.2 > [2] RCurl_1.95-4.3 > [3] bitops_1.0-6 > [4] RSQLite_0.11.4 > [5] DBI_0.2-7 > [6] biomaRt_2.20.0 > [7] BiocInstaller_1.14.2 > [8] GenomicAlignments_1.0.5 > [9] BSgenome_1.32.0 > [10] Rsamtools_1.16.1 > [11] Biostrings_2.32.1 > [12] XVector_0.4.0 > [13] ggbio_1.12.8 > [14] ggplot2_1.0.0 > [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 > [16] GenomicFeatures_1.16.2 > [17] AnnotationDbi_1.26.0 > [18] Biobase_2.24.0 > [19] GenomicRanges_1.16.4 > [20] GenomeInfoDb_1.0.2 > [21] IRanges_1.22.10 > [22] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1 > [4] biovizBase_1.12.1 brew_1.0-6 checkmate_1.2 > [7] cluster_1.15.2 codetools_0.2-8 colorspace_1.2-4 > [10] dichromat_2.0-0 digest_0.6.4 fail_1.2 > [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0 > [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4 > [19] iterators_1.0.7 labeling_0.2 lattice_0.20-29 > [22] latticeExtra_0.6-26 MASS_7.3-33 munsell_0.4.2 > [25] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 > [28] Rcpp_0.11.2 reshape2_1.4 rtracklayer_1.24.2 > [31] scales_0.2.4 sendmailR_1.1-2 splines_3.1.0 > [34] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 > [37] tcltk_3.1.0 tools_3.1.0 > VariantAnnotation_1.10.5 > [40] XML_3.98-1.1 zlibbioc_1.10.0 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin, PhD Product Manager Seven Bridges Genomics sbgenomics.com One Broadway FL 7 Cambridge, MA 02142 (617) 866-0446 [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6