IRanges: RangedData.values() deletes rownames
1
0
Entering edit mode
@michael-dondrup-3849
Last seen 10.2 years ago
Hi, I remember having posted something like this earlier. calling values on a RangedData object deletes the ranges names, if the DataFrame doesn't have names, is that intentional? > rd = RangedData(ranges=IRanges(start=1:2, width=1, names=c("A","B")), space=1) > rownames(rd) [1] "A" "B" > values(rd) = DataFrame(somedata=1:2) > rownames(rd) NULL I can work around this by setting them again: values(rd) = DataFrame(somedata=1:2, row.names=c("A","B")) but that's still a glitch... Michael R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.10.0 RCurl_1.4-3 bitops_1.0-4.1 IRanges_1.8.0
• 943 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
On Wed, Nov 3, 2010 at 9:06 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: > Hi, > > I remember having posted something like this earlier. > > calling values on a RangedData object deletes the ranges names, if the > DataFrame doesn't have names, is > that intentional? > > > rd = RangedData(ranges=IRanges(start=1:2, width=1, names=c("A","B")), > space=1) > > rownames(rd) > [1] "A" "B" > > values(rd) = DataFrame(somedata=1:2) > > rownames(rd) > NULL > > Definitely a bug, because it yields an invalid object, where the ranges have names but not the data. The question is how to rectify the names. You're expecting that if the DataFrame has NULL names, for it to take the names of the RangedData. That makes sense to me. If the rownames on the DataFrame were not NULL and different from the RangedData, what should happen? I'm thinking that should throw an error; that's how ranges<- has always behaved. Anyway, I checked in a fix to the devel version. Thanks for reporting this. Michael > I can work around this by setting them again: > > values(rd) = DataFrame(somedata=1:2, row.names=c("A","B")) > but that's still a glitch... > > Michael > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.0 RCurl_1.4-3 bitops_1.0-4.1 IRanges_1.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, I have done a little more testing on the values()<- function and I can only warn from using it, at least when there are multiple spaces in the RangedData object. You will have no means of getting the names back in the right order! Even if it wouldn't delete the ranges names, it will shuffle the data, and I don't see a use-case where this function could be used in a sensible way. The problem is the order in which the the data is kept, e.g. i read in some data into a data.frame, then made a RangedData from it. Then assigning more value columns found in the data.frame. But the data in the Ranged data object are no longer in the original order but in the order of the RangedData sorted by spaces. So how are you supposed to get the order right in the DataFrame before using values()<-? - it's not possible.... Therefore I opt for that either the function is removed, or that a matching on the row.names of DataFrame is made, and if there is a difference or no row.names an error is thrown, or that the data is added in the order of the ranges of the rangedData (that should be preserving the original order). If this matching is not made, I see no way of guessing the right order for the DataFrame. Some example code to illustrate what I mean: > rd = RangedData(ranges= IRanges(start=1:10, width=1, names=letters[1:10]), space=sample(1:2, 10, re=T)) > rn = rownames(rd) # save the names in the right order > values(rd) = DataFrame(data=letters[1:10], row.names=letters[1:10]) > rd # it's broken, but you dont see it >row.names(rd) = rn # now, everything is broken, but at least you can see it: > rd On Nov 3, 2010, at 9:51 PM, Michael Lawrence wrote: > > > On Wed, Nov 3, 2010 at 9:06 AM, Michael Dondrup <michael.dondrup at="" uni.no=""> wrote: > Hi, > > I remember having posted something like this earlier. > > calling values on a RangedData object deletes the ranges names, if the DataFrame doesn't have names, is > that intentional? > > > rd = RangedData(ranges=IRanges(start=1:2, width=1, names=c("A","B")), space=1) > > rownames(rd) > [1] "A" "B" > > values(rd) = DataFrame(somedata=1:2) > > rownames(rd) > NULL > > > Definitely a bug, because it yields an invalid object, where the ranges have names but not the data. The question is how to rectify the names. You're expecting that if the DataFrame has NULL names, for it to take the names of the RangedData. That makes sense to me. If the rownames on the DataFrame were not NULL and different from the RangedData, what should happen? I'm thinking that should throw an error; that's how ranges<- has always behaved. > > Anyway, I checked in a fix to the devel version. Thanks for reporting this. > > Michael > > I can work around this by setting them again: > > values(rd) = DataFrame(somedata=1:2, row.names=c("A","B")) > but that's still a glitch... > > Michael > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.10.0 RCurl_1.4-3 bitops_1.0-4.1 IRanges_1.8.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On Thu, Nov 4, 2010 at 7:58 AM, Michael Dondrup <michael.dondrup@uni.no>wrote: > Hi, > > I have done a little more testing on the values()<- function and I can only > warn from using it, at least when > there are multiple spaces in the RangedData object. You will have no means > of getting the names back in the right order! > > Even if it wouldn't delete the ranges names, it will shuffle the data, and > I don't see a use-case where this function > could be used in a sensible way. The problem is the order in which the the > data is kept, e.g. i read in some data into a data.frame, > then made a RangedData from it. Then assigning more value columns found in > the data.frame. But the data in the Ranged data object > are no longer in the original order but in the order of the RangedData > sorted by spaces. > I don't see how the function is useless. You simply have to sort the data by chromosome. Or you might generate a DataFrame from the ranges in your RangedData, which then will be in the correct order. A lot of times, data is already sorted by chromosome. The sorting performed by RangedData is a well known weakness, but in practice I encounter it rarely. > > So how are you supposed to get the order right in the DataFrame before > using values()<-? > - it's not possible.... > Why is it not possible? All RangedData does is sort by chromosome. There are definitely specific scenarios where the chromosome is unknown for the data. But in general, I don't see the problem. > Therefore I opt for that either the function is removed, or > that a matching on the row.names of DataFrame is made, and if there is a > difference or no row.names an error is thrown, or that the > data is added in the order of the ranges of the rangedData (that should be > preserving the original order). > If this matching is not made, I see no way of guessing the right order for > the DataFrame. > > I think I would leave it to the user to do this sort of matching. > Some example code to illustrate what I mean: > > > > rd = RangedData(ranges= IRanges(start=1:10, width=1, > names=letters[1:10]), space=sample(1:2, 10, re=T)) > > rn = rownames(rd) # save the names in the right order > > values(rd) = DataFrame(data=letters[1:10], row.names=letters[1:10]) > > rd # it's broken, but you dont see it > In the devel version, this should throw an error. > >row.names(rd) = rn # now, everything is broken, but at least you can see > it: > > rd > > > > > > On Nov 3, 2010, at 9:51 PM, Michael Lawrence wrote: > > > > > > > On Wed, Nov 3, 2010 at 9:06 AM, Michael Dondrup <michael.dondrup@uni.no> > wrote: > > Hi, > > > > I remember having posted something like this earlier. > > > > calling values on a RangedData object deletes the ranges names, if the > DataFrame doesn't have names, is > > that intentional? > > > > > rd = RangedData(ranges=IRanges(start=1:2, width=1, names=c("A","B")), > space=1) > > > rownames(rd) > > [1] "A" "B" > > > values(rd) = DataFrame(somedata=1:2) > > > rownames(rd) > > NULL > > > > > > Definitely a bug, because it yields an invalid object, where the ranges > have names but not the data. The question is how to rectify the names. > You're expecting that if the DataFrame has NULL names, for it to take the > names of the RangedData. That makes sense to me. If the rownames on the > DataFrame were not NULL and different from the RangedData, what should > happen? I'm thinking that should throw an error; that's how ranges<- has > always behaved. > > > > Anyway, I checked in a fix to the devel version. Thanks for reporting > this. > > > > Michael > > > > I can work around this by setting them again: > > > > values(rd) = DataFrame(somedata=1:2, row.names=c("A","B")) > > but that's still a glitch... > > > > Michael > > R version 2.12.0 (2010-10-15) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] rtracklayer_1.10.0 RCurl_1.4-3 bitops_1.0-4.1 > IRanges_1.8.0 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6