GRanges nearest problem
1
0
Entering edit mode
@arnemuellernovartiscom-2205
Last seen 9.1 years ago
Switzerland
Hello, I've come across a problem in GRanges nearest, if subject of the nearest call contains strand information (+/-) and the query does not (*), the method takes a long time to run and raises warnings: mm9.pro.gr and mm9.2ktiles.gr are both Granges objects. > strandmm9.pro.gr) = "-" > strandmm9.2ktiles.gr) = "*" > system.time(nn <- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) user system elapsed 27.150 0.002 27.416 There were 50 or more warnings (use warnings() to see the first 50) > warnings() Warning messages: 1: In start(ranges(x1Split[[st]])) - end(subSplit2) : longer object length is not a multiple of shorter object length 2: In start(ranges(x1Split[[st]])) - end(subSplit2) : longer object length is not a multiple of shorter object length 3: In start(ranges(x1Split[[st]])) - end(subSplit2) : longer object length is not a multiple of shorter object length 4: In start(ranges(x1Split[[st]])) - end(subSplit2) : longer object length is not a multiple of shorter object length I think if a range in either query or subject is non-stranded (*) both, the method should look for the nearest neighbor ignoring the strand (at least that's my suggestion ;-). If I set the strand info of the subject to "*" the method runs fine: > strandmm9.pro.gr) = "*" > system.time(nn <- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) user system elapsed 0.264 0.000 0.264 If the query is "stranded" (+/-) and the subject isn't, the method runs fine, too (though longer as if both query and subject are non- stranded, but I guess this can be expected): > system.time(nn <- nearestmm9.pro.gr[1:5000], mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) user system elapsed 3.084 0.000 3.125 Another odd behavior is that if the query contains sequence names not contained in the subject an error is raised – the other way around works fine. Wouldn't it make sense so set the vector elements of sequences only found in the query to NA? Kind regards, Arne [[alternative HTML version deleted]]
• 1.2k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Arne, Thanks for pointing out these bugs. I'll post again here when they have been fixed. Valerie On 04/13/11 05:29, Mueller, Arne wrote: > Hello, > > I've come across a problem in GRanges nearest, if subject of the nearest call contains strand information (+/-) and the query does not (*), the method takes a long time to run and raises warnings: > > mm9.pro.gr and mm9.2ktiles.gr are both Granges objects. > > >> strandmm9.pro.gr) = "-" >> strandmm9.2ktiles.gr) = "*" >> system.time(nn<- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> > user system elapsed > 27.150 0.002 27.416 > There were 50 or more warnings (use warnings() to see the first 50) > >> warnings() >> > Warning messages: > 1: In start(ranges(x1Split[[st]])) - end(subSplit2) : > longer object length is not a multiple of shorter object length > 2: In start(ranges(x1Split[[st]])) - end(subSplit2) : > longer object length is not a multiple of shorter object length > 3: In start(ranges(x1Split[[st]])) - end(subSplit2) : > longer object length is not a multiple of shorter object length > 4: In start(ranges(x1Split[[st]])) - end(subSplit2) : > longer object length is not a multiple of shorter object length > ? > > I think if a range in either query or subject is non-stranded (*) both, the method should look for the nearest neighbor ignoring the strand (at least that's my suggestion ;-). > > If I set the strand info of the subject to "*" the method runs fine: > > >> strandmm9.pro.gr) = "*" >> system.time(nn<- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> > user system elapsed > 0.264 0.000 0.264 > > If the query is "stranded" (+/-) and the subject isn't, the method runs fine, too (though longer as if both query and subject are non- stranded, but I guess this can be expected): > > >> system.time(nn<- nearestmm9.pro.gr[1:5000], mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> > user system elapsed > 3.084 0.000 3.125 > > Another odd behavior is that if the query contains sequence names not contained in the subject an error is raised ? the other way around works fine. Wouldn't it make sense so set the vector elements of sequences only found in the query to NA? > > Kind regards, > > Arne > > > > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Arne, Thank you for pointing out the error. I have checked in some changes to fix this issue. Nishant On 04/14/2011 06:21 AM, Valerie Obenchain wrote: > Hi Arne, > > Thanks for pointing out these bugs. I'll post again here when they > have been fixed. > > Valerie > > > On 04/13/11 05:29, Mueller, Arne wrote: >> Hello, >> >> I've come across a problem in GRanges nearest, if subject of the >> nearest call contains strand information (+/-) and the query does not >> (*), the method takes a long time to run and raises warnings: >> >> mm9.pro.gr and mm9.2ktiles.gr are both Granges objects. >> >>> strandmm9.pro.gr) = "-" >>> strandmm9.2ktiles.gr) = "*" >>> system.time(nn<- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> user system elapsed >> 27.150 0.002 27.416 >> There were 50 or more warnings (use warnings() to see the first 50) >>> warnings() >> Warning messages: >> 1: In start(ranges(x1Split[[st]])) - end(subSplit2) : >> longer object length is not a multiple of shorter object length >> 2: In start(ranges(x1Split[[st]])) - end(subSplit2) : >> longer object length is not a multiple of shorter object length >> 3: In start(ranges(x1Split[[st]])) - end(subSplit2) : >> longer object length is not a multiple of shorter object length >> 4: In start(ranges(x1Split[[st]])) - end(subSplit2) : >> longer object length is not a multiple of shorter object length >> ? >> >> I think if a range in either query or subject is non-stranded (*) >> both, the method should look for the nearest neighbor ignoring the >> strand (at least that's my suggestion ;-). >> >> If I set the strand info of the subject to "*" the method runs fine: >> >>> strandmm9.pro.gr) = "*" >>> system.time(nn<- nearestmm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> user system elapsed >> 0.264 0.000 0.264 >> >> If the query is "stranded" (+/-) and the subject isn't, the method >> runs fine, too (though longer as if both query and subject are >> non-stranded, but I guess this can be expected): >> >>> system.time(nn<- nearestmm9.pro.gr[1:5000], >>> mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000])) >> user system elapsed >> 3.084 0.000 3.125 >> >> Another odd behavior is that if the query contains sequence names not >> contained in the subject an error is raised ? the other way around >> works fine. Wouldn't it make sense so set the vector elements of >> sequences only found in the query to NA? >> >> Kind regards, >> >> Arne >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6