Please allow integers in `numeric` format in `seqlengths`
1
0
Entering edit mode
@charles-plessy-7857
Last seen 14 months ago
Japan

Hello,

In a function for plotting pairwise genome alignments, I am concatenating all sequence levels of some GRanges objects. This can create seqlengths that are longer than what R's integer format can support. Unfortunately, my attempts to use the numeric format instead did not work as seqlengths() coerces its input to integer format as in the example below.

> gr <- GRanges("chr1:1-2")
> seqlengths(gr) <- 3e9
Warning message:
In .normarg_seqlengths(value, seqnames(x)) :
  NAs introduced by coercion to integer range

And I could not cheat the system as a validation function will block me from putting other things than integers in the S4 slot.

> gr@seqinfo@seqlengths <- 3e9
Error in (function (cl, name, valueClass)  : 
  assignment of an object of class “numeric” is not valid for @‘seqlengths’ in an object of class “Seqinfo”; is(value, "integer") is not TRUE

Would it be possible to allow integer numeric values as seqlengths, for instance after validating them with the is.wholenumber workaround suggested in help(integer)?

is.wholenumber <- 
    function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol
GenomeInfoDb • 1.3k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 1 hour ago
Seattle, WA, United States

Hi Charles,

That change would be easy but is not desirable. The implications of allowing seqlengths greater than 2^31 are deep. Once you allow this, there's no reason to not allow genomic ranges that are also longer than 2^31 nucleotides. For example people would expect to be able to store the "chr1:1-3e9" range in their GRanges object. Unfortunately this is not possible at the moment because the start/end/width of a GRanges/IRanges object must be integer vectors. Changing this would require revisiting/adapting a lot of C code. It would also mean that things like Biostrings::getSeq() would now have the ability to return a DNAStringSet object where some sequences are longer than 2^31. And this in turn would require revisiting/adapting most of Biostrings C code as well as supporting long Vector derivatives in general, which would be a huge endeavor!

Do you really need to store the seqlengths of the concatenated sequences? If not, you could you just set them to NAs.

If you really need to cheat the system, use slot(gr@seqinfo, "seqlengths", check=FALSE) <- 3e9. All bets are off when you do things like this e.g. depending on what you're going to do with this invalid object, you might get all kinds of errors, even crashes. So use at your own risks!

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Thanks Hervé for the detailed answer. Since the purpose is only to produce a plot, I will transform the GRanges to a transient data structure instead of modifying it.

ADD REPLY

Login before adding your answer.

Traffic: 855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6