Subset GRanges object into new object
1
0
Entering edit mode
jhanks1981 • 0
@jhanks1981-14273
Last seen 7.2 years ago

I have already read a few pages such as this one subset GRanges

 

library(GenomicRanges)
library(GenomicFeatures)
#make the list of chromosomes to filter by

#make the database and transcripts
txdb <- makeTxDbFromGFF("Me/inputs/gencode.v19.chr_patch_hapl_scaff.annotation.gtf", format="gtf")
tscripts <- transcriptsBy(txdb, by="gene")

 

> #test the subsetting.. seems to work
> tscripts@unlistData@seqnames@values[tscripts@unlistData@seqnames@values == "chr1"]
   [1] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1
  [34] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1
  [67] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1
[100] chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1 chr1

But it doesnt work as expected..

> tscripts[tscripts@unlistData@seqnames@values == "chr1"]
GRangesList object of length 5272:
$ENSG00000000419.8
GRanges object with 7 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id           tx_name
         <Rle>            <IRanges>  <Rle> | <integer>       <character>
  [1]    chr20 [49551404, 49575087]      - |    182563 ENST00000371588.5
  [2]    chr20 [49551404, 49575087]      - |    182564 ENST00000466152.1
  [3]    chr20 [49551404, 49575092]      - |    182565 ENST00000371582.4
  [4]    chr20 [49551433, 49562398]      - |    182566 ENST00000494752.1
  [5]    chr20 [49551482, 49575058]      - |    182567 ENST00000371584.4
  [6]    chr20 [49551490, 49575069]      - |    182568 ENST00000371583.5
  [7]    chr20 [49552685, 49575069]      - |    182569 ENST00000413082.1

 

I also tried this, but got empty elements

> tscripts[seqnames(tscripts) == "chr1"]
GRangesList object of length 63568:
$ENSG00000000003.10
GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |     tx_id     tx_name
      <Rle> <IRanges>  <Rle> | <integer> <character>

$ENSG00000000005.5
GRanges object with 0 ranges and 2 metadata columns:
     seqnames ranges strand | tx_id tx_name

$ENSG00000000419.8
GRanges object with 0 ranges and 2 metadata columns:
     seqnames ranges strand | tx_id tx_name

 

 

What am I doing wrong here?

 

R genomicranges subsetting • 6.8k views
ADD COMMENT
0
Entering edit mode

I am trying to compare two GTF files.

First, I want to subset it so that I'm only including Chr1-22+X+Y from each GTF file.

Second, I want to see what features are not common to both objects

Third, I want to take each feature and compare the start and end locations between each file.

Finally, I want to make a list of every feature that does not exactly match and find out if the length is the same, or if the start/end is just shifted.

But as you can see, I'm stuck on the subsetting part, which isn't working as I expected it to.

ADD REPLY
1
Entering edit mode

It sounds like you might just want to get the transcripts as a GRanges directly, i.e., call transcripts() instead of transcriptsBy().

ADD REPLY
2
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

First, please avoid accessing the slots of the object, as they are an internal implementation detail. Code accessing the slots will be cryptic and fragile.

Note that this object is not a GRanges but a GRangesList. You might want to simplify to a GRanges using something like:

tscripts_gr <- stack(tscripts, "gene_id")

It depends on what you want to do downstream. Some more details would help.

ADD COMMENT
0
Entering edit mode

That did it!

tscripts_gr <- stack(tscripts, "gene_id")
tscripts_gr[tscripts_gr@seqnames == "chr1"]

GRanges object with 17531 ranges and 3 metadata columns:
          seqnames                 ranges strand   |           gene_id     tx_id           tx_name
             <Rle>              <IRanges>  <Rle>   |             <Rle> <integer>       <character>
      [1]     chr1 [169818772, 169863093]      -   | ENSG00000000457.9     15488 ENST00000367771.6
      [2]     chr1 [169821804, 169863076]      -   | ENSG00000000457.9     15489 ENST00000367772.4
      [3]     chr1 [169822215, 169858029]      -   | ENSG00000000457.9     15490 ENST00000367770.1
      [4]     chr1 [169823652, 169863408]      -   | ENSG00000000457.9     15491 ENST00000423670.1
      [5]     chr1 [169828260, 169863093]      -   | ENSG00000000457.9     15492 ENST00000470238.1
      ...      ...                    ...    ... ...               ...       ...               ...
  [17527]     chr1 [   997588,    998668]      -   | ENSG00000273443.1      8981 ENST00000442292.2
  [17528]     chr1 [201964824, 201965480]      -   | ENSG00000273478.1     16109 ENST00000608886.1
  [17529]     chr1 [151300425, 151300905]      +   | ENSG00000273481.1      5344 ENST00000609583.1
  [17530]     chr1 [113060421, 113061063]      -   | ENSG00000273483.1     13245 ENST00000608357.1
  [17531]     chr1 [ 92654794,  92656264]      +   | ENSG00000273487.1      3920 ENST00000609675.1
  -------
ADD REPLY

Login before adding your answer.

Traffic: 304 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6