Recently I asked how one - if given some genomic positions as a GRanges object - could find the amino acid codons that corresponded to these positions (if in a coding exon): C: How to find the amino acid codon corresponding to a translated genomic position . Got a nice solution from the site.
I now find myself in a similar but somewhat extended (complicated ?) situation. I have a GRanges with ~1000 > width > 330 bases. Parts of these ranges overlap with coding exons and I wish to know what are the corresponding amino acid sequences in these overlaps. The catch is that part of each range - since they are genomic - may also overlap with intron or untranslated parts of exons before they 'continue' into some translated part. tried to fiddle around the solution I got in the answer to the above question/link, but couldnt figure it out. How can I find the amino acids "overlapping" these ranges ?
I get the idea, but is stucked after
pintersect()
. This returns a GRangesList object with the coordinates of the overlaps with transcritps exons and includes a boolean hit mcols. How do I subset this object's TRUE values in order to get the genomic positions (of cds transcripts) with overlap to my regions. I gues this is my next step. The object is like:Pass
drop.nohit.ranges=TRUE
topintersect()
.Smart, and
unlist()
afterwards gets you a nice GRanges object. Anyway, do you extract/access individual elements of a GRangeList object; would it be something alonglapply
perhaps ?I don't think you want to unlist, at least not until after the
pmapToTranscript()
call. Whilelapply()
does of course work (GRangesList is essentially a list), it is typically not the most convenient, nor the most performant, way to things.This will not work as
pmapToTranscripts()
can not take two GrangesLists, namely the one from mypintersect()
and the other being cds transcripts fromcdsBy().
Perhaps I should not use
cdsBy()
here (that would intuitively make sense though) ?Yet, maybe I should not have used
unlist()
as I now more ranges than the initial 249 list elements since some of my regions ranges overlaps multiple exons. Hence I loose the connection to my query (the ranges) and cannot usepmapToTranscript()
.So in the end, I seem to be stucked anyway ?
I use
pmapToTranscripts()
with the intersect coordinates and the GrangesList of cds transcripts fromcdsBy()
which does not work as both are List objects.Basic strategy is
grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])
gr <- unlist(reduce(relist(grl_tx, grl)))
But I will add a method to pmapToTranscripts that does that.
Do you mean (following the ideas above):
grl <- pintersect(tr_hits, tx_hits, drop.nohit.ranges=TRUE )
Here tr_hits and tx_hits are output from
findOverlaps()
as suggested. Then,grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])
does not work since
cds[togroup(grl)]
only subsets the first few transcripts of cds (togroup(grl)
gives a vector 1:length(grl) ) and cds is the collection of hg19 cds transcripts. I must have misunderstood something here.I was just writing the rough algorithm. In that example, you want to replace
cds
withtx_hits
.