[Bioc-sig-seq] as.data.frame on GRanges object with DNAStringSet in values
1
0
Entering edit mode
@herve-pages-1542
Last seen 1 hour ago
Seattle, WA, United States
Hi Michael, On 11-09-29 02:17 PM, Michael Lawrence wrote: > I saw that all coercions to atomic vectors from AtomicList are now > deprecated. You had proposed deprecating as.vector(), because it should > not unlist, and I agreed. Really as.vector() should return an ordinary R > list. However, as.character(), as.numeric(), etc, in base R will unlist. They don't seem to do that: > as.integer(list(a=1:3, b=4:-2)) Error: (list) object cannot be coerced to type 'integer' > as.character(list(a=1:3, b=4:-2)) [1] "1:3" "c(4, 3, 2, 1, 0, -1, -2)" So they either refuse to do the coercion or they do it in a strange way. Note that in the latter case they honor the strong expectation that the output of the as.<atomic_type> coercion functions must have the same length as the input (with positions of the elements being preserved). unlist() would not honor this. H. > I'd like to keep consistency with base R. Do we really need to deprecate > those, as well? > > Michael > > 2011/6/15 Michael Lawrence <michafla at="" gene.com="" <mailto:michafla="" at="" gene.com="">> > > > > 2011/6/15 Hervé Pagès <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > > On 11-06-15 03:38 PM, Michael Lawrence wrote: > > > > 2011/6/15 Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>> > > > Hi Michael, Janet, > > I just added an "as.vector" method for XStringSet objects to > Biostrings 2.21.6: > > > library(Biostrings) > > x <- DNAStringSet(c("aaatg", "gt")) > > as.vector(x) > [1] "AAATG" "GT" > > But that doesn't solve Janet's problem: > > > df <- DataFrame(id=c("ID1", "ID2"), seqs=x) > > df > DataFrame with 2 rows and 2 columns > id seqs > <character> <dnastringset> > 1 ID1 AAATG > 2 ID2 GT > > as.data.frame(df) > > Error in as.data.frame.default(y, optional = TRUE, ...) : > cannot coerce class 'structure("DNAStringSet", package = > "Biostrings")' into a data.frame > > Michael? > > > Well, sorry for that. I just added a coercion from Vector to > data.frame > through as.vector, so this works. > > > Thanks! > > > But someone might add a coercion from > List to data.frame that would treat the elements as columns. > Would this > make sense? > > > Hard to tell. Maybe sometimes it would make sense, but sometimes it > definitely does not (e.g. DNAStringSet). > > > AtomicList to data.frame does something even stranger: it > creates a two column data frame with the unlisted values and > names/indices rep'd out as a factor. Actually, that's kind > of cool, > since usually one does not have a list with equal element > lengths, but > it's somewhat unintuitive. But why does it apply only to > AtomicList? > > > Glad you bring this on the table. > > For the record, "as.vector" also unrolls an AtomicList: > > > as.vector(IntegerList(1:4, 0:-2)) > [1] 1 2 3 4 0 -1 -2 > > IMO, we should not do things like that. Because: > > 1) The same can be achieved with unlist(): > > > unlist(IntegerList(1:4, 0:-2)) > [1] 1 2 3 4 0 -1 -2 > > 2) It's totally unintuitive to use as.vector for unlisting > a list (as.vector on a standard list does not do that). > > 3) There is a strong expectation that as.vector() will preserve > the length of its input. > > So I propose to deprecate those "as.vector" and "as.data.frame" > methods for AtomicList objects. > > > Sounds good to me. In fact, the stack method on List is almost > identical to as.data.frame on AtomicList (and the stack method > actually makes sense). You could make as.vector return an ordinary > list, since list is a vector. > > H. > > > Anyway, given the special correspondence between a > XStringSet and a > character vector, we could always add an as.data.frame > method for > XStringSet, just to make sure stuff behaves as expected. > > Thanks, > H. > > > > sessionInfo() > R version 2.14.0 Under development (unstable) > (2011-05-30 r56024) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] Biostrings_2.21.6 IRanges_1.11.10 > > > > On 11-06-15 12:49 PM, Janet Young wrote: > > yes - as.character seems a good choice, I think > > thanks, > > Janet > > On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote: > > So you would expect that the DNAStringSet is > converted to a > character vector? DNAStringSet (technically > XStringSet) then > just needs an as.vector method that delegates to > as.character. > > Michael > > > On Wed, Jun 15, 2011 at 12:37 PM, Janet > Young<jayoung at="" fhcrc.org=""> <mailto:jayoung at="" fhcrc.org=""> <mailto:jayoung at="" fhcrc.org=""> <mailto:jayoung at="" fhcrc.org="">>> wrote: > > Hi there, > > I'm trying to as as.data.frame on a GRanges > object. On > regular GRanges objects it works fine but I have > some > objects that contain a DNAStringSet in the > values column, > which isn't built in to the as.data.frame > method. Is it > possible to add the ability to coerce the > DNAStringSet too, > please? > > Here's some code that demonstrates the issue: > > ################ > library(GenomicRanges) > library(Biostrings) > > gr1<- > > GRanges(seqnames=rep("chr1",3),ranges=IRanges(start=c( 1,101,201),width=50),strand=c("+","-","+"), > genenames=c("seq1","seq2","seq3") ) > > as.data.frame(gr1) > # works > > gr2<- gr1 > values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", > "ACGGTGGTGTT", "GAGGCTG")) > > as.data.frame(gr2) > # Error in as.data.frame.default(y, optional = > TRUE, ...) : > # cannot coerce class > 'structure("DNAStringSet", package = > "Biostrings")' into a data.frame > ################ > > and here's sessionInfo() output: > > R version 2.13.0 (2011-04-13) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] Biostrings_2.20.1 GenomicRanges_1.4.6 > IRanges_1.10.4 > > ################ > > > You might wonder why I'm storing sequences in > the GRanges > values - in my real data they're sequencing > reads that have > mapped back to that region, but I'm still curious to > maintain the sequence itself (for the moment) > because it's > not always identical to the underlying genomic > sequence of > that region (investigating mapping issues). > > (and my desire to use as.data.frame relates to a > suggestion > from Herve to let me workaround some issues with the > identical function) > > thanks, > > Janet > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing at r-project.org > <mailto:bioc-sig-sequencing at="" r-project.org=""> > <mailto:bioc-sig-sequencing at="" r-project.org=""> <mailto:bioc-sig-sequencing at="" r-project.org="">> > > https://stat.ethz.ch/mailman/listinfo/bioc-sig- sequencing > > > _______________________________________________ > Bioc-sig-sequencing mailing list > Bioc-sig-sequencing at r-project.org > <mailto:bioc-sig-sequencing at="" r-project.org=""> > <mailto:bioc-sig-sequencing at="" r-project.org=""> <mailto:bioc-sig-sequencing at="" r-project.org="">> > > https://stat.ethz.ch/mailman/listinfo/bioc-sig- sequencing > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Cancer Biostrings GLAD Cancer Biostrings GLAD • 1.7k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
2011/10/7 Hervé Pagès <hpages@fhcrc.org> > Hi Michael, > > > On 11-09-29 02:17 PM, Michael Lawrence wrote: > >> I saw that all coercions to atomic vectors from AtomicList are now >> deprecated. You had proposed deprecating as.vector(), because it should >> not unlist, and I agreed. Really as.vector() should return an ordinary R >> list. However, as.character(), as.numeric(), etc, in base R will unlist. >> > > They don't seem to do that: > > > as.integer(list(a=1:3, b=4:-2)) > Error: (list) object cannot be coerced to type 'integer' > > > as.character(list(a=1:3, b=4:-2)) > [1] "1:3" "c(4, 3, 2, 1, 0, -1, -2)" > > So they either refuse to do the coercion or they do it in a strange > way. Note that in the latter case they honor the strong expectation > that the output of the as.<atomic_type> coercion functions must have > the same length as the input (with positions of the elements being > preserved). unlist() would not honor this. > > I see. I had tried as.character(list("foo", "bar")), which makes sense in that case. I see now what you mean about it not making sense in general. > H. > > > I'd like to keep consistency with base R. Do we really need to deprecate >> those, as well? >> >> Michael >> >> 2011/6/15 Michael Lawrence <michafla@gene.com <mailto:michafla@gene.com="">> >> >> >> >> 2011/6/15 Hervé Pagès <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">> >> >> >> On 11-06-15 03:38 PM, Michael Lawrence wrote: >> >> >> >> 2011/6/15 Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> <mailto:hpages@fhcrc.org>> >> <mailto:hpages@fhcrc.org>>> >> >> >> Hi Michael, Janet, >> >> I just added an "as.vector" method for XStringSet objects >> to >> Biostrings 2.21.6: >> >> > library(Biostrings) >> > x <- DNAStringSet(c("aaatg", "gt")) >> > as.vector(x) >> [1] "AAATG" "GT" >> >> But that doesn't solve Janet's problem: >> >> > df <- DataFrame(id=c("ID1", "ID2"), seqs=x) >> > df >> DataFrame with 2 rows and 2 columns >> id seqs >> <character> <dnastringset> >> 1 ID1 AAATG >> 2 ID2 GT >> > as.data.frame(df) >> >> Error in as.data.frame.default(y, optional = TRUE, ...) : >> cannot coerce class 'structure("DNAStringSet", package >> = >> "Biostrings")' into a data.frame >> >> Michael? >> >> >> Well, sorry for that. I just added a coercion from Vector to >> data.frame >> through as.vector, so this works. >> >> >> Thanks! >> >> >> But someone might add a coercion from >> List to data.frame that would treat the elements as columns. >> Would this >> make sense? >> >> >> Hard to tell. Maybe sometimes it would make sense, but sometimes it >> definitely does not (e.g. DNAStringSet). >> >> >> AtomicList to data.frame does something even stranger: it >> creates a two column data frame with the unlisted values and >> names/indices rep'd out as a factor. Actually, that's kind >> of cool, >> since usually one does not have a list with equal element >> lengths, but >> it's somewhat unintuitive. But why does it apply only to >> AtomicList? >> >> >> Glad you bring this on the table. >> >> For the record, "as.vector" also unrolls an AtomicList: >> >> > as.vector(IntegerList(1:4, 0:-2)) >> [1] 1 2 3 4 0 -1 -2 >> >> IMO, we should not do things like that. Because: >> >> 1) The same can be achieved with unlist(): >> >> > unlist(IntegerList(1:4, 0:-2)) >> [1] 1 2 3 4 0 -1 -2 >> >> 2) It's totally unintuitive to use as.vector for unlisting >> a list (as.vector on a standard list does not do that). >> >> 3) There is a strong expectation that as.vector() will preserve >> the length of its input. >> >> So I propose to deprecate those "as.vector" and "as.data.frame" >> methods for AtomicList objects. >> >> >> Sounds good to me. In fact, the stack method on List is almost >> identical to as.data.frame on AtomicList (and the stack method >> actually makes sense). You could make as.vector return an ordinary >> list, since list is a vector. >> >> H. >> >> >> Anyway, given the special correspondence between a >> XStringSet and a >> character vector, we could always add an as.data.frame >> method for >> XStringSet, just to make sure stuff behaves as expected. >> >> Thanks, >> H. >> >> >> > sessionInfo() >> R version 2.14.0 Under development (unstable) >> (2011-05-30 r56024) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 >> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C >> >> >> attached base packages: >> [1] stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] Biostrings_2.21.6 IRanges_1.11.10 >> >> >> >> On 11-06-15 12:49 PM, Janet Young wrote: >> >> yes - as.character seems a good choice, I think >> >> thanks, >> >> Janet >> >> On Jun 15, 2011, at 12:46 PM, Michael Lawrence wrote: >> >> So you would expect that the DNAStringSet is >> converted to a >> character vector? DNAStringSet (technically >> XStringSet) then >> just needs an as.vector method that delegates to >> as.character. >> >> Michael >> >> >> On Wed, Jun 15, 2011 at 12:37 PM, Janet >> Young<jayoung@fhcrc.org>> <mailto:jayoung@fhcrc.org> <mailto:jayoung@fhcrc.org>> >> <mailto:jayoung@fhcrc.org>>> wrote: >> >> Hi there, >> >> I'm trying to as as.data.frame on a GRanges >> object. On >> regular GRanges objects it works fine but I have >> some >> objects that contain a DNAStringSet in the >> values column, >> which isn't built in to the as.data.frame >> method. Is it >> possible to add the ability to coerce the >> DNAStringSet too, >> please? >> >> Here's some code that demonstrates the issue: >> >> ################ >> library(GenomicRanges) >> library(Biostrings) >> >> gr1<- >> >> GRanges(seqnames=rep("chr1",3)** >> ,ranges=IRanges(start=c(1,101,**201),width=50),strand=c("+","-**"," +"), >> genenames=c("seq1","seq2","**seq3") ) >> >> as.data.frame(gr1) >> # works >> >> gr2<- gr1 >> values(gr2)[,"myseqs"]<- DNAStringSet(c ("AACGTG", >> "ACGGTGGTGTT", "GAGGCTG")) >> >> as.data.frame(gr2) >> # Error in as.data.frame.default(y, optional = >> TRUE, ...) : >> # cannot coerce class >> 'structure("DNAStringSet", package = >> "Biostrings")' into a data.frame >> ################ >> >> and here's sessionInfo() output: >> >> R version 2.13.0 (2011-04-13) >> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >> >> locale: >> [1] >> en_US.UTF-8/en_US.UTF-8/C/C/**en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils >> datasets >> methods base >> >> other attached packages: >> [1] Biostrings_2.20.1 GenomicRanges_1.4.6 >> IRanges_1.10.4 >> >> ################ >> >> >> You might wonder why I'm storing sequences in >> the GRanges >> values - in my real data they're sequencing >> reads that have >> mapped back to that region, but I'm still curious >> to >> maintain the sequence itself (for the moment) >> because it's >> not always identical to the underlying genomic >> sequence of >> that region (investigating mapping issues). >> >> (and my desire to use as.data.frame relates to a >> suggestion >> from Herve to let me workaround some issues with >> the >> identical function) >> >> thanks, >> >> Janet >> >> ______________________________**_________________ >> Bioc-sig-sequencing mailing list >> Bioc-sig-sequencing@r-project.**org<bioc-sig- sequencing@r-project.org=""> >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> > >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> >> >> >> https://stat.ethz.ch/mailman/**listinfo/bioc-sig- sequencing<https: stat.ethz.ch="" mailman="" listinfo="" bioc-sig-sequencing=""> >> >> >> ______________________________**_________________ >> Bioc-sig-sequencing mailing list >> Bioc-sig-sequencing@r-project.**org<bioc-sig- sequencing@r-project.org=""> >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> > >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> >> <mailto:bioc-sig-sequencing@r-**project.org<bioc-sig- sequencing@r-project.org=""> >> >> >> >> https://stat.ethz.ch/mailman/**listinfo/bioc-sig- sequencing<https: stat.ethz.ch="" mailman="" listinfo="" bioc-sig-sequencing=""> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">> >> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6