Question

How to find the amino acid codons corresponding to a subset of a range of genomic positions

0

Entering edit mode

madsheilskov ▴ 10

@madsheilskov-9317

Last seen 4.6 years ago

Denmark

Recently I asked how one - if given some genomic positions as a GRanges object - could find the amino acid codons that corresponded to these positions (if in a coding exon): C: How to find the amino acid codon corresponding to a translated genomic position . Got a nice solution from the site.

I now find myself in a similar but somewhat extended (complicated ?) situation. I have a GRanges with ~1000 > width > 330 bases. Parts of these ranges overlap with coding exons and I wish to know what are the corresponding amino acid sequences in these overlaps. The catch is that part of each range - since they are genomic - may also overlap with intron or untranslated parts of exons before they 'continue' into some translated part. tried to fiddle around the solution I got in the answer to the above question/link, but couldnt figure it out. How can I find the amino acids "overlapping" these ranges ?

genomicfeatures genomicranges cds overlap • 2.7k views

ADD COMMENT • link updated 9.4 years ago by Michael Lawrence ★ 11k • written 9.4 years ago by madsheilskov ▴ 10

score 1 · Answer 1 · 2015-12-06

1

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.4 years ago

United States

A first step would probably be to match the ranges to overlapping transcripts (cds) with findOverlaps(). Then, subscript with the hits to line up the ranges with the transcripts and find the portion of each range within the matched transcript with pintersect(). Use pmapToTranscript() to map coordinates, then proceed as before.

ADD COMMENT • link 9.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

I get the idea, but is stucked after pintersect(). This returns a GRangesList object with the coordinates of the overlaps with transcritps exons and includes a boolean hit mcols. How do I subset this object's TRUE values in order to get the genomic positions (of cds transcripts) with overlap to my regions. I gues this is my next step. The object is like:

GRangesList object of length 249:
$62638 
GRanges object with 7 ranges and 4 metadata columns:
      seqnames             ranges strand |    cds_id    cds_name exon_rank       hit
         <Rle>          <IRanges>  <Rle> | <integer> <character> <integer> <logical>
  [1]    chr17 [7578371, 7578370]      - |    183302        <NA>         1     FALSE
  [2]    chr17 [7578177, 7578176]      - |    183301        <NA>         2     FALSE
  [3]    chr17 [7577499, 7577498]      - |    183298        <NA>         3     FALSE
  [4]    chr17 [7577019, 7577018]      - |    183297        <NA>         4     FALSE
  [5]    chr17 [7576853, 7576852]      - |    183296        <NA>         5     FALSE
  [6]    chr17 [7573927, 7574033]      - |    183293        <NA>         6      TRUE
  [7]    chr17 [7572927, 7572926]      - |    183292        <NA>         7     FALSE

ADD REPLY • link 9.4 years ago madsheilskov ▴ 10

0

Entering edit mode

Pass drop.nohit.ranges=TRUE to pintersect().

ADD REPLY • link 9.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Smart, and unlist() afterwards gets you a nice GRanges object. Anyway, do you extract/access individual elements of a GRangeList object; would it be something along lapply perhaps ?

ADD REPLY • link 9.4 years ago madsheilskov ▴ 10

0

Entering edit mode

I don't think you want to unlist, at least not until after the pmapToTranscript() call. While lapply() does of course work (GRangesList is essentially a list), it is typically not the most convenient, nor the most performant, way to things.

ADD REPLY • link 9.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

This will not work as pmapToTranscripts() can not take two GrangesLists, namely the one from my pintersect() and the other being cds transcripts from cdsBy().

Perhaps I should not use cdsBy() here (that would intuitively make sense though) ?

ADD REPLY • link 9.4 years ago madsheilskov ▴ 10

0

Entering edit mode

Yet, maybe I should not have used unlist() as I now more ranges than the initial 249 list elements since some of my regions ranges overlaps multiple exons. Hence I loose the connection to my query (the ranges) and cannot use pmapToTranscript().

So in the end, I seem to be stucked anyway ?

I use pmapToTranscripts() with the intersect coordinates and the GrangesList of cds transcripts from cdsBy() which does not work as both are List objects.

ADD REPLY • link 9.4 years ago madsheilskov ▴ 10

1

Entering edit mode

Basic strategy is

grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])

gr <- unlist(reduce(relist(grl_tx, grl)))

But I will add a method to pmapToTranscripts that does that.

ADD REPLY • link 9.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Do you mean (following the ideas above):

grl <- pintersect(tr_hits, tx_hits, drop.nohit.ranges=TRUE )

Here tr_hits and tx_hits are output from findOverlaps() as suggested. Then,

grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])

does not work since cds[togroup(grl)] only subsets the first few transcripts of cds (togroup(grl) gives a vector 1:length(grl) ) and cds is the collection of hg19 cds transcripts. I must have misunderstood something here.

ADD REPLY • link 9.4 years ago madsheilskov ▴ 10

0

Entering edit mode

I was just writing the rough algorithm. In that example, you want to replace cds with tx_hits.

ADD REPLY • link 9.4 years ago Michael Lawrence ★ 11k