Forgot to copy everyone on my last email and also just for
completeness I
do think that the previous error is likely due to the memory. So
reading
some more about the memory limits in ?"Memory-limits" I am wondering
if
any tweaks can be made for me to create a big adjacency matrix.
-A
On Thu, Apr 26, 2012 at 2:24 PM, Abhishek Pratap <apratap@lbl.gov>
wrote:
> Hi Michael
>
> I finally got thru this and got expected results on the test data
set.
>
> On full data till the intersect step it runs well but when I am
trying to
> create an adjacency matrix in order to create a graph I am getting
an error
> by the matrix creation object.
>
> Error in matrix(0, queryLength(good_hits), subjectLength(good_hits))
:
> too many elements specified
>
> queryLength(good_hits) = 900,000
>
> I am wondering if there is an efficient way to construct a graph to
find
> the connected components for a large set of points.
>
>
>
> Thanks!
> -Abhi
>
>
>
> On Wed, Apr 25, 2012 at 4:14 PM, Michael Lawrence <
> lawrence.michael@gene.com> wrote:
>
>> Making an adjacency matrix from a Hits object would be something
like:
>>
>> am <- matrix(0, queryLength(hits), subjectLength(hits))
>> am[as.matrix(hits)] <- 1
>>
>> Michael
>>
>>
>> On Wed, Apr 25, 2012 at 3:40 PM, Abhishek Pratap <apratap@lbl.gov>
wrote:
>>
>>> Thanks Steve. Legit solution and it works for me too. Based on my
>>> partial understanding of methods I have no idea how this will
scale for a
>>> million points,I have in the actual data(may be it will) but I
will let you
>>> know.
>>>
>>> @Michael : I updated my installation and I am able to run the
intersect
>>> step on the findOverlaps() output from start and end.
>>> I guess now I need to convert the common hits to a graph object
and call
>>> connComp on it. Any way I could convert hits matrix to a adjacency
matrix
>>> to create a graph or maybe there is another slick way to find the
connected
>>> points.
>>>
>>> ir <- IRanges(c(10,10,11,9,10,11),
width=c(190,190,190,190,180,180))
>>> start <- flank(ir,1,both=TRUE)
>>> end <- flank(ir,1,start=FALSE,both=TRUE)
>>> start_overlaps <- findOverlaps(start)
>>> end_overlaps <- findOverlaps(end)
>>> good_hits <- intersect(start_overlaps,end_overlaps)
>>>
>>>
>>> Thanks!
>>> -Abhi
>>>
>>>
>>> On Wed, Apr 25, 2012 at 3:08 PM, Steve Lianoglou <
>>> mailinglist.honeypot@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Wed, Apr 25, 2012 at 5:21 PM, Abhishek Pratap
<apratap@lbl.gov>
>>>> wrote:
>>>> > Hi Michael
>>>> >
>>>> > SessionInfo copied below. My versions could be one older to
current
>>>> one. I
>>>> > am still wondering how I can get this information in a format
that
>>>> can be
>>>> > digested by connectedComp or something similar. I think we are
close
>>>> to a
>>>> > solution.
>>>>
>>>> Step 1: Upgrade R ;-)
>>>>
>>>> It's not necessary for the approach I'm going to suggest, but
it'll
>>>> probably make it easier for Michael to help you w/ his solution,
which
>>>> is probably going to be more robust than the
>>>> duct-tape-and-elmer's-glue snippet I'm going to try:
>>>>
>>>> R> library(GenomicRanges)
>>>> R> ir <- IRanges(c(10,10,11,9,10,11),
width=c(190,190,190,190,180,180))
>>>> R> starts <- reduce(resize(ir, width=1, fix='start'),
min.gapwidth=2)
>>>> R> ends <- reduce(resize(ir, width=1, fix='end'), min.gapwidth=2)
>>>> R> sc <- countOverlaps(ir, starts)
>>>> R> ec <- countOverlaps(ir, ends)
>>>>
>>>> ... and ... good morning:
>>>>
>>>> R> split(ir, (paste(sc,ec,sep=":")))
>>>> CompressedIRangesList of length 2
>>>> $`1:1`
>>>> IRanges of length 2
>>>> start end width
>>>> [1] 10 189 180
>>>> [2] 11 190 180
>>>>
>>>> $`1:2`
>>>> IRanges of length 4
>>>> start end width
>>>> [1] 10 199 190
>>>> [2] 10 199 190
>>>> [3] 11 200 190
>>>> [4] 9 198 190
>>>>
>>>> HTH,
>>>> -steve
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>> | Memorial Sloan-Kettering Cancer Center
>>>> | Weill Medical College of Cornell University
>>>> Contact Info:
http://cbio.mskcc.org/~lianos/contact
>>>>
>>>
>>>
>>
>
[[alternative HTML version deleted]]