Dear all,
this issue has popped up more often but I couldn't find a satisfactory answer. I need a general function that gives me the 5'-coordinate(s) of a GRanges object (BTW this seems like a very obvious thing to have as a built-in function, like start(), end(), strand() and width() ... or did I miss something?)
So I have to iterate over the 'rows' of the GRanges, but the only thing I could come up with is this:
fiveprime <- function(gr)sapply(1:length(gr), function(i)if(as.vector(strand(gr[i])) == "+") start(gr[i]) else end(gr[i]))
This works, but looks clumsy (and I don't know if it performs well). Am I doing something wrong and/or reinventing wheels? Regards,
Philip
PS: this is R 3.3.0 , GenomicRanges_1.24.2
Thanks for your quick answer
> We can do that with
resize(gr, 1)
That is cool and very concise, but nearly everywhere in the documentation
"start"
means: leftmost coordinate, not 5'-coordinate of the feature at hand. E.g.start(GRanges(IRanges(start=10, end=20), strand='-', seqnames="x"))
yields 10, not 20. So to me it is unexpected that
resize(gr, width=1, fix='start')
fixes the 5'-coordinate rather than the leftmost coordinate. The documentation is not clear, or at least I could not find it. It looks like this behaviour occurs for
resize(), flank()
andpromoters()
, but maybe others too?IMHO it really would be much clearer if there were also two additional methods
fiveprime()
andthreeprime()
, and if 'start' and 'end' always mean the same everywhere ...Regards,
P
PS: minor nitpick, I think that
promoters()
should only be available for objects that have a strand, i.e. should not work on IRanges objectsRegarding the documentation, this is the documentation for
resize
for aGenomicRanges
object:resize returns an object of the same type and length as x containing intervals that have been resized to width width based on the strand(x) values. Elements where strand(x) == "+" or strand(x) == "*" are anchored at start(x) and elements where strand(x) == "-" are anchored at the end(x). The use.names argument determines whether or not to keep the names on the ranges
Which seems pretty clear to me.
Pretty sure I agree with you that it doesn't make much sense for
promoters
to be defined on anIRanges
object (ie. a range without a strand), but also open to the idea that the authors might have thought of a use case that I haven't considered given the 30 seconds worth of thought I just put into it ...You are right, I missed that, silly.
I maintain though that it is wrong to have "start" sometimes mean "leftmost coordinate regardless of strand", and sometimes "5'-coordinate", even for objects that have a strand ... it confuses the hell out of biologists who occasionally use these data types. Would have been much clearer to use separate sets of names for these operations. It's too late to change that now, but it really could do with clearer documentation.