apply over the 'rows' of a GRanges object (and/or: is there a fiveprime() function for GRanges objects?)
1
0
Entering edit mode
@philip-lijnzaad-2499
Last seen 2.3 years ago
European Union

Dear all,

this issue has popped up more often but I couldn't find a satisfactory answer. I need a general function that gives me the 5'-coordinate(s) of a GRanges object (BTW this seems like a very obvious thing to have as a  built-in function, like start(), end(), strand() and width() ... or did I miss something?)

So I have to iterate over the 'rows' of the GRanges, but the only thing I could come up with is this:

fiveprime <- function(gr)sapply(1:length(gr), function(i)if(as.vector(strand(gr[i])) == "+") start(gr[i]) else end(gr[i]))

This works, but looks clumsy (and I don't know if it performs well). Am I doing something wrong and/or reinventing wheels? Regards,

Philip

PS: this  is  R 3.3.0 , GenomicRanges_1.24.2

 

granges genomicranges • 1.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

In my experience, it's been most useful to have the range representing the 5' end. We can do that with resize(gr, 1). Then, you can just use start() to get the integer position. I'm not really aware of a more direct way.

Also, btw, you could easily vectorize your solution using ifelse().

 

ADD COMMENT
0
Entering edit mode

Thanks for your quick answer

> We can do that with resize(gr, 1)

That is cool and very concise, but nearly everywhere in the documentation "start" means: leftmost coordinate, not 5'-coordinate of the feature at hand. E.g.

start(GRanges(IRanges(start=10, end=20), strand='-', seqnames="x"))

yields 10, not 20. So to me it is unexpected that

resize(gr, width=1, fix='start')

fixes the 5'-coordinate rather than the leftmost coordinate. The documentation is not clear, or at least I could not find it. It looks like this behaviour occurs for resize(), flank() and promoters(), but maybe others too?

IMHO it really would be much clearer if there were also two additional methods fiveprime() and threeprime(), and if 'start' and 'end' always mean the same everywhere ...

Regards,

P

PS: minor nitpick, I think that promoters() should only be available for objects that have a strand, i.e. should not work on IRanges objects

ADD REPLY
0
Entering edit mode

Regarding the documentation, this is the documentation for resize for a GenomicRanges object:

resize returns an object of the same type and length as x containing intervals that have been resized to width width based on the strand(x) values. Elements where strand(x) == "+" or strand(x) == "*" are anchored at start(x) and elements where strand(x) == "-" are anchored at the end(x). The use.names argument determines whether or not to keep the names on the ranges

Which seems pretty clear to me.

Pretty sure I agree with you that it doesn't make much sense for promoters to be defined on an IRanges object (ie. a range without a strand), but also open to the idea that the authors might have thought of a use case that I haven't considered given the 30 seconds worth of thought I just put into it ...

 

ADD REPLY
0
Entering edit mode

You are right, I missed that, silly.

I maintain though that it is wrong to have "start" sometimes mean "leftmost coordinate regardless of strand", and sometimes "5'-coordinate", even for objects that have a strand ... it confuses the hell out of biologists who occasionally use these data types.  Would have been much clearer to use separate sets of names for these operations. It's too late to change that now, but it really could do with clearer documentation.

ADD REPLY

Login before adding your answer.

Traffic: 794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6