rtrackleyer/GenomicRanges: How can I group GRanges by metadata attributes
Entering edit mode
Guest User ★ 13k
Last seen 10.1 years ago
After reading in a GTF file with rtrackler::import(), I have a GRanges object. How can I select the ranges whose gene_id (or transcript_id) match a particular value? > library(rtracklayer) > gtf <- import(system.file("tests", "gtf.gff", package="rtracklayer"), asRangedData=F) I can see the metadata with 'mcols(gtf)', and I can even see the gene_id with 'mcols(gtf)$group', > mcols(gtf)$group [1] gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; [omitted] 3 Levels: gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; ... but I don't see any way to utilize the name=value strings in the group column. Do I just have to parse the values group column myself? Thanks in advance for any help. -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.20.4 GenomicRanges_1.12.5 IRanges_1.18.4 BiocGenerics_0.6.0 BiocInstaller_1.10.4 loaded via a namespace (and not attached): [1] Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 RCurl_1.95-4.1 Rsamtools_1.12.4 stats4_3.0.2 [7] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
Entering edit mode
Last seen 2.8 years ago
United States
There are anywhere from 3 to infinity versions of GFF. When rtracklayer cannot detect the version (from either the filename extension or #gff-version directive), it assumes GFF version 1. To tell it otherwise, use the "version" argument. In this case, you want version="2". But what you have is actually a special subformat of GFF2, called GTF. rtracklayer could have detected this by the "gtf" extension, or you could pass format="gtf". It doesn't do anything special with GTF, though. Maybe what you really want is GenomicFeatures::makeTranscriptDbFromGFF, which uses rtracklayer to make a TranscriptDb from GFF data. That's generally more appropriate for representing transcript structures compared to GRanges. Michael On Thu, Oct 31, 2013 at 10:51 PM, chris warth [guest] < guest@bioconductor.org> wrote: > > After reading in a GTF file with rtrackler::import(), I have a GRanges > object. How can I select the ranges whose gene_id (or transcript_id) match > a particular value? > > > library(rtracklayer) > > gtf <- import(system.file("tests", "gtf.gff", package="rtracklayer"), > asRangedData=F) > > I can see the metadata with 'mcols(gtf)', and I can even see the gene_id > with 'mcols(gtf)$group', > > > mcols(gtf)$group > [1] gene_id "ENSMUSG00000033501.1"; transcript_id "ENSMUST00000040592.1"; > exon_id "ENSMUSE00000310143.1"; > [omitted] > 3 Levels: gene_id "ENSMUSG00000033501.1"; transcript_id > "ENSMUST00000040592.1"; exon_id "ENSMUSE00000310143.1"; ... > > but I don't see any way to utilize the name=value strings in the group > column. > > Do I just have to parse the values group column myself? > > Thanks in advance for any help. > > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] rtracklayer_1.20.4 GenomicRanges_1.12.5 IRanges_1.18.4 > BiocGenerics_0.6.0 BiocInstaller_1.10.4 > > loaded via a namespace (and not attached): > [1] Biostrings_2.28.0 bitops_1.0-6 BSgenome_1.28.0 RCurl_1.95-4.1 > Rsamtools_1.12.4 stats4_3.0.2 > [7] tools_3.0.2 XML_3.95-0.2 zlibbioc_1.6.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6