I think your data are sufficiently complex that you should have just a
table with the read ID, range and transcript ID.
Easily formed with:
DataFrame(read = togroup(grl)[queryHits(m)], tx = subjectHits(m),
range =
ranges(m))
Michael
On Thu, Oct 3, 2013 at 7:59 PM, Tim Johnstone
<timothy.johnstone@yale.edu>wrote:
> Unfortunately it's not that easy, my analysis won't work with the
reads
> being broken into multiple sub-ranges as I need to preserve their
length
> and make sure I'm not double-counting. It's determining which reads
were
> split and recombining them after the mapping that I'm not sure
about. I
> understand the concept behind relist(), but I'm not actually sure
how to
> combine that with the results of map() to recover the full gapped
reads.
>
> Thanks for taking the time to help
>
> Best,
> Tim
>
>
> On Thu, Oct 3, 2013 at 7:22 PM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:
>
>>
>>
>>
>> On Thu, Oct 3, 2013 at 4:15 PM, Tim Johnstone
<timothy.johnstone@yale.edu>> > wrote:
>>
>>> Hi Michael,
>>>
>>> What should I do with the RangesMapping object in order to
reconstruct
>>> the reads that were split when coercing to GRanges? Thus far I
cannot get
>>> relist to work.
>>>
>>> My end goal is to have the following IRangesList (where each
IRanges
>>> holds all the reads that align to a transcript, and each range is
a read in
>>> transcript coordinates).
>>>
>>>
>> So you are OK with a read being broken into multiple sub-ranges?
The
>> RangesMapping will give you the queryHits(mapping) [the indices of
the read
>> ranges], the subjectHits(mapping) [the transcript indices] and the
>> ranges(mapping) [coordinates].
>>
>> So just do something like split(ranges(m), subjectHits(m)) to get a
>> RangesList like the one you want.
>>
>>
>>
>>> > refSeqReads_TxCoord
>>> IRangesList of length 15759
>>> $NM_001089558
>>> IRanges of length 13
>>> start end width
>>> [1] 380 400 21
>>> [2] 380 400 21
>>> [3] 380 400 21
>>> [4] 380 400 21
>>> [5] 2277 2305 29
>>> ... ... ... ...
>>> [9] 2278 2306 29
>>> [10] 2278 2306 29
>>> [11] 2278 2306 29
>>> [12] 2279 2303 25
>>> [13] 2278 2306 29
>>>
>>> $NM_131819
>>> IRanges of length 0
>>>
>>> $NM_173228
>>> IRanges of length 4
>>> start end width
>>> [1] 126 147 22
>>> [2] 126 147 22
>>> [3] 128 148 21
>>> [4] 128 148 21
>>>
>>> ...
>>> <15756 more elements>
>>>
>>> Thanks,
>>> Tim
>>>
>>>
>>> On Thu, Oct 3, 2013 at 3:42 PM, Michael Lawrence <
>>> lawrence.michael@gene.com> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> You want to coerce your reads to a GRangesList, i.e., grglist().
Then
>>>> the gaps are preserved. The issue then is that the map method
only accepts
>>>> a GRanges for the mapping. It wouldn't be too hard to subvert
the existing
>>>> functionality for GRangesList. Just need to unlist the GRanges,
map() it,
>>>> and use relist() to get things back into the original form, if it
matters.
>>>> There's opportunity here for a map,GRangesList method.
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Thu, Oct 3, 2013 at 12:34 PM, Tim Johnstone <
>>>> timothy.johnstone@yale.edu> wrote:
>>>>
>>>>> I think I may be going about this wrong, as my reads should not
>>>>> actually align to introns. My goal is to map a set of reads
>>>>> (GappedAlignments) to transcript coordinates (contained in a
TranscriptDB)
>>>>>
>>>>> I have been coercing the GappedAlignments object to a GRanges,
and
>>>>> coercing the TranscriptDB to a GRangesList (using exonsBy) in
order to run
>>>>> map(GRanges, GRangesList). During that coercion, the
GappedAlignments loses
>>>>> gap information, meaning that the reads would be interpreted as
overlapping
>>>>> introns (whereas they are actually gapped and span exon
boundaries as
>>>>> expected).
>>>>>
>>>>> I would just map the start positions of the reads as you
suggest, but
>>>>> I need to preserve the read length (qwidth) information.
>>>>>
>>>>> How should I go about performing this mapping?
>>>>>
>>>>> Thanks again, and sorry for the confusion
>>>>> Tim
>>>>>
>>>>> On Wed, Oct 2, 2013 at 4:00 PM, Michael Lawrence <
>>>>> lawrence.michael@gene.com> wrote:
>>>>>
>>>>>> To me, it seems most sensible then to consider treating the end
>>>>>> points as the ranges (width 1). I'm not sure it makes sense to
map reads
>>>>>> that are apparently aligning to introns. You can map the end
points
>>>>>> individually and then combine them back together fairly easily.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 2, 2013 at 11:26 AM, Tim Johnstone <
>>>>>> timothy.johnstone@yale.edu> wrote:
>>>>>>
>>>>>>> I am mapping ribosome protected fragments, which may start
inside
>>>>>>> one exon and end in the following one. I'd like to convert
their genomic
>>>>>>> coordinates into transcript-based coordinates.
>>>>>>>
>>>>>>> When I plot the positions of the reads that result from the
map
>>>>>>> function, I see a lack of reads around each splice junction,
so I'm
>>>>>>> guessing those reads are being discarded as they're not fully
contained in
>>>>>>> a given exon. Does findOverlaps consider a read 'within' if it
starts in
>>>>>>> one exon and finishes in another? It seems like there's
something obvious
>>>>>>> I'm missing.
>>>>>>>
>>>>>>> I guess [type="start"] is actually what I'm looking for, as
>>>>>>> otherwise reads might be double-counted(?), but I still see no
option for
>>>>>>> supplying that to 'map'.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Tim
>>>>>>>
>>>>>>> P.S. I was previously using a function I wrote in R, which
would run
>>>>>>> findOverlaps[type=any], then take all reads that overlapped a
given
>>>>>>> transcript and convert their positions to transcript
coordinates using a
>>>>>>> 'refLoc2transcriptLoc' function I wrote. This worked for small
sets,
>>>>>>> however, it's proving far too memory- and time-inefficient for
the larger
>>>>>>> transcript sets I am now working with, so map seemed to be a
good solution.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 2, 2013 at 1:55 PM, Michael Lawrence <
>>>>>>> lawrence.michael@gene.com> wrote:
>>>>>>>
>>>>>>>> What do you want to happen in that case? Those reads would be
>>>>>>>> outside the annotated transcript.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2013 at 10:48 AM, Tim Johnstone <
>>>>>>>> timothy.johnstone@yale.edu> wrote:
>>>>>>>>
>>>>>>>>> Thanks for your response!
>>>>>>>>>
>>>>>>>>> From what I can see, map calls 'findOverlaps' using
>>>>>>>>> [type="within"]. Is it possible to supply an argument to
'map' such that
>>>>>>>>> it will call findOverlaps with [type="any"]? I am running
>>>>>>>>> map(GappedAlignments, GRangesList), where the GRangesList is
a list of
>>>>>>>>> exons, and I need to include reads that overlap the exons at
any point, but
>>>>>>>>> it currently looks like I'm not capturing the reads at
intron-exon
>>>>>>>>> boundaries as they don't fully overlap.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Tim J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 17, 2013 at 1:06 PM, Michael Lawrence <
>>>>>>>>> lawrence.michael@gene.com> wrote:
>>>>>>>>>
>>>>>>>>>> There is a method on the map generic in GenomicRanges that
will
>>>>>>>>>> do this given a GRanges of genomic/global coordinates, and
a GRangesList
>>>>>>>>>> defining transcripts or any other type of compound region.
The returned
>>>>>>>>>> RangesMapping object tells you the hits (which overlapped
which) and for
>>>>>>>>>> each hit the local coordinates.
>>>>>>>>>>
>>>>>>>>>> Michael
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 17, 2013 at 9:33 AM, Tim Johnstone <
>>>>>>>>>> timothy.johnstone@yale.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> To add to my previous email, I just found the
refLocsToLocalLocs
>>>>>>>>>>> function in the VariantAnnotation package you co-authored,
but need to
>>>>>>>>>>> convert single positions that align anywhere in the
transcript, rather than
>>>>>>>>>>> ranges that align to just the coding region. It looks like
it may be
>>>>>>>>>>> possible to get this behavior by supplying exonsByTx as
both the second and
>>>>>>>>>>> the third arguments, but let me know if there is an easier
way.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Tim Johnstone
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 17, 2013 at 12:19 PM, Tim Johnstone <
>>>>>>>>>>> timothy.johnstone@yale.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Michael,
>>>>>>>>>>>>
>>>>>>>>>>>> I am working with the GenomicRanges bioconductor package,
and
>>>>>>>>>>>> I've been trying to find a way to convert a set of
locations in genomic
>>>>>>>>>>>> coordinates to transcript coordinates. I found an old
mailing
>>>>>>>>>>>> list conversation<https: stat.ethz.ch="" pipermail="" biocondu="" ctor="" attachments="" 20110316="" 01b4dc42="" attachment.pl="">in which you
indicated you were planning to implement that function
>>>>>>>>>>>> (refLocs2TranscriptLocs or globalToLocal) in
GenomicRanges/GenomicFeatures,
>>>>>>>>>>>> but in looking through the source code I can't find it.
Was this function
>>>>>>>>>>>> ever implemented, and if so, would it be possible for you
to send it to me?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Tim Johnstone
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Timothy G. Johnstone <http: timothyjohnstone.com=""/>
>>>>>>>>>>>> Yale University School of Medicine
>>>>>>>>>>>> New Haven, CT 06510
>>>>>>>>>>>> 503-318-0962
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Timothy G. Johnstone
>>>>>>>>>>> Yale University School of Medicine
>>>>>>>>>>> New Haven, CT 06510
>>>>>>>>>>> 503-318-0962
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Timothy G. Johnstone
>>>>>>>>> Yale University School of Medicine
>>>>>>>>> New Haven, CT 06510
>>>>>>>>> 503-318-0962
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Timothy G. Johnstone
>>>>>>> Yale University School of Medicine
>>>>>>> New Haven, CT 06510
>>>>>>> 503-318-0962
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Timothy G. Johnstone
>>>>> Yale University School of Medicine
>>>>> New Haven, CT 06510
>>>>> 503-318-0962
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Timothy G. Johnstone
>>> Yale University School of Medicine
>>> New Haven, CT 06510
>>> 503-318-0962
>>>
>>
>>
>
>
> --
> Timothy G. Johnstone
> Yale University School of Medicine
> New Haven, CT 06510
> 503-318-0962
>
[[alternative HTML version deleted]]