second GenomicRanges request - enable query-self comparison (and so ignoreSelf and ignoreRedundant) for GRangesList objects?
1
0
Entering edit mode
Janet Young ▴ 740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi again, This is somewhat related to another request I sent earlier this afternoon. Is it possible to implement query-self comparisons (i.e. where I specify query but not subject) for GRangesList objects? The motivation is that I'd like to use the ignoreSelf and ignoreRedundant options in a GRangesList comparison. I mentioned in my other request that I'm looking through a set of genes to find pairs that overlap on opposite strands. I'm now using findOverlaps on a GRangesList object instead of a GRanges object - that's because I want to only look at cases where parts of the final spliced transcript overlap, not cases where a large intron-containing gene has another smaller gene nested in an intron. When I used findOverlaps on the entire genes at once as GRanges objects, my results included pairs of that nested type, but if I use the blocks function to get just the exons as a GRangesList object, that lets me successfully ignore the nested gene case, which is great. However, with GRangesList I can't use the query-self comparison and therefore can't access those useful ignoreSelf and ignoreRedundant options. I know I can workaround that too with some effort, but it'd be great to have it as part of the underlying code. Again, I've included some code below that should show what I'm trying to do. all the best, Janet library(rtracklayer) #### get some drosophila genes as a test case: mySession <- browserSession() genome(mySession) <- "dm3" genes <- ucscTableQuery (mySession, track="flyBaseGene", table="flyBaseGene") genes <- track(genes) #### reduce to a smaller example that contains a gene pair of the type I'm talking about (CG33797-RA is nested inside CG11152-RA) genes <- genes[148:152] #### remove strand info, as a workaround to not being able to specify ignore.strand for a query-self findOverlaps call strand(genes) <- "*" #### using findOverlaps on the genes themselves shows me the nested pair (query=3, subject=4) findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) #### so I'll use blocks to extract only the exonic portions of the genes as a GRangesList: genes_GRL <- blocks(genes) #### and use findOverlaps on that GRangesList object, first by specifying it as both query and subject in the comparison - this gives me more or less what I want (i.e. it does NOT show the nested pair 3-4), except that there's a bunch of filtering to do later. findOverlaps( genes_GRL, genes_GRL) #### ideally, to help me filter the hits I'd like to be able to use ignoreSelf and ignoreRedundant, but I can only use those if it's a query-self comparison (i.e. only works if no subject is specified) findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE) # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) #### and it looks like findOverlaps is not implemented for the query- self case for GRangesList objects findOverlaps( genes_GRL) # Error in match.arg(type) : 'arg' must be of length 1 sessionInfo() R version 3.1.0 Patched (2014-05-26 r65771) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [10] fail_1.2 foreach_1.4.2 GenomicAlignments_1.1.13 [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 [25] zlibbioc_1.11.1
• 1.5k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
I think I might have already added this a couple days ago to devel... tough to keep it all straight in my head On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi again, > > This is somewhat related to another request I sent earlier this afternoon. > Is it possible to implement query-self comparisons (i.e. where I specify > query but not subject) for GRangesList objects? The motivation is that > I'd like to use the ignoreSelf and ignoreRedundant options in a GRangesList > comparison. > > I mentioned in my other request that I'm looking through a set of genes to > find pairs that overlap on opposite strands. I'm now using findOverlaps on > a GRangesList object instead of a GRanges object - that's because I want to > only look at cases where parts of the final spliced transcript overlap, not > cases where a large intron-containing gene has another smaller gene nested > in an intron. When I used findOverlaps on the entire genes at once as > GRanges objects, my results included pairs of that nested type, but if I > use the blocks function to get just the exons as a GRangesList object, that > lets me successfully ignore the nested gene case, which is great. However, > with GRangesList I can't use the query-self comparison and therefore can't > access those useful ignoreSelf and ignoreRedundant options. I know I can > workaround that too with some effort, but it'd be great to have it as part > of the underlying code. > > Again, I've included some code below that should show what I'm trying to > do. > > all the best, > > Janet > > > library(rtracklayer) > > #### get some drosophila genes as a test case: > mySession <- browserSession() > genome(mySession) <- "dm3" > genes <- ucscTableQuery (mySession, track="flyBaseGene", > table="flyBaseGene") > genes <- track(genes) > > #### reduce to a smaller example that contains a gene pair of the type I'm > talking about (CG33797-RA is nested inside CG11152-RA) > genes <- genes[148:152] > > #### remove strand info, as a workaround to not being able to specify > ignore.strand for a query-self findOverlaps call > strand(genes) <- "*" > > #### using findOverlaps on the genes themselves shows me the nested pair > (query=3, subject=4) > findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) > > #### so I'll use blocks to extract only the exonic portions of the genes > as a GRangesList: > genes_GRL <- blocks(genes) > > #### and use findOverlaps on that GRangesList object, first by specifying > it as both query and subject in the comparison - this gives me more or less > what I want (i.e. it does NOT show the nested pair 3-4), except that > there's a bunch of filtering to do later. > findOverlaps( genes_GRL, genes_GRL) > > #### ideally, to help me filter the hits I'd like to be able to use > ignoreSelf and ignoreRedundant, but I can only use those if it's a > query-self comparison (i.e. only works if no subject is specified) > findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE) > # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : > # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) > > #### and it looks like findOverlaps is not implemented for the query-self > case for GRangesList objects > findOverlaps( genes_GRL) > # Error in match.arg(type) : 'arg' must be of length 1 > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 > [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 > [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [10] fail_1.2 foreach_1.4.2 > GenomicAlignments_1.1.13 > [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 > [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 > [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 > [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 > [25] zlibbioc_1.11.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
I'm glad that's on your radar already - nice. I updated all devel packages this morning - it's not there yet. Here's my new sessionInfo(): > sessionInfo() R version 3.1.0 Patched (2014-05-26 r65771) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 GenomeInfoDb_1.1.7 [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 loaded via a namespace (and not attached): [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [10] fail_1.2 foreach_1.4.2 GenomicAlignments_1.1.14 [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 [16] RCurl_1.95-4.1 Rsamtools_1.17.25 RSQLite_0.11.4 [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 [25] zlibbioc_1.11.1 On Jun 10, 2014, at 3:31 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > I think I might have already added this a couple days ago to devel... tough to keep it all straight in my head > > > On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi again, > > This is somewhat related to another request I sent earlier this afternoon. Is it possible to implement query-self comparisons (i.e. where I specify query but not subject) for GRangesList objects? The motivation is that I'd like to use the ignoreSelf and ignoreRedundant options in a GRangesList comparison. > > I mentioned in my other request that I'm looking through a set of genes to find pairs that overlap on opposite strands. I'm now using findOverlaps on a GRangesList object instead of a GRanges object - that's because I want to only look at cases where parts of the final spliced transcript overlap, not cases where a large intron-containing gene has another smaller gene nested in an intron. When I used findOverlaps on the entire genes at once as GRanges objects, my results included pairs of that nested type, but if I use the blocks function to get just the exons as a GRangesList object, that lets me successfully ignore the nested gene case, which is great. However, with GRangesList I can't use the query-self comparison and therefore can't access those useful ignoreSelf and ignoreRedundant options. I know I can workaround that too with some effort, but it'd be great to have it as part of the underlying code. > > Again, I've included some code below that should show what I'm trying to do. > > all the best, > > Janet > > > library(rtracklayer) > > #### get some drosophila genes as a test case: > mySession <- browserSession() > genome(mySession) <- "dm3" > genes <- ucscTableQuery (mySession, track="flyBaseGene", table="flyBaseGene") > genes <- track(genes) > > #### reduce to a smaller example that contains a gene pair of the type I'm talking about (CG33797-RA is nested inside CG11152-RA) > genes <- genes[148:152] > > #### remove strand info, as a workaround to not being able to specify ignore.strand for a query-self findOverlaps call > strand(genes) <- "*" > > #### using findOverlaps on the genes themselves shows me the nested pair (query=3, subject=4) > findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) > > #### so I'll use blocks to extract only the exonic portions of the genes as a GRangesList: > genes_GRL <- blocks(genes) > > #### and use findOverlaps on that GRangesList object, first by specifying it as both query and subject in the comparison - this gives me more or less what I want (i.e. it does NOT show the nested pair 3-4), except that there's a bunch of filtering to do later. > findOverlaps( genes_GRL, genes_GRL) > > #### ideally, to help me filter the hits I'd like to be able to use ignoreSelf and ignoreRedundant, but I can only use those if it's a query-self comparison (i.e. only works if no subject is specified) > findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE) > # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : > # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) > > #### and it looks like findOverlaps is not implemented for the query-self case for GRangesList objects > findOverlaps( genes_GRL) > # Error in match.arg(type) : 'arg' must be of length 1 > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 > [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 > [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > [10] fail_1.2 foreach_1.4.2 GenomicAlignments_1.1.13 > [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 > [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 > [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 > [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 > [25] zlibbioc_1.11.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Turns out I added something else. In this case, the method does exist, but it's just broken. You might have better luck if you specify the type argument to findOverlaps, like type="any". On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung@fhcrc.org> wrote: > I'm glad that's on your radar already - nice. I updated all devel > packages this morning - it's not there yet. Here's my new sessionInfo(): > > > sessionInfo() > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 GenomeInfoDb_1.1.7 > [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 > > [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 > > [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 > > [10] fail_1.2 foreach_1.4.2 > GenomicAlignments_1.1.14 > [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 > > [16] RCurl_1.95-4.1 Rsamtools_1.17.25 RSQLite_0.11.4 > > [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 > > [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 > > [25] zlibbioc_1.11.1 > > On Jun 10, 2014, at 3:31 PM, Michael Lawrence <lawrence.michael@gene.com> > wrote: > > I think I might have already added this a couple days ago to devel... > tough to keep it all straight in my head > > > On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung@fhcrc.org> wrote: > >> Hi again, >> >> This is somewhat related to another request I sent earlier this >> afternoon. Is it possible to implement query-self comparisons (i.e. where I >> specify query but not subject) for GRangesList objects? The motivation is >> that I'd like to use the ignoreSelf and ignoreRedundant options in a >> GRangesList comparison. >> >> I mentioned in my other request that I'm looking through a set of genes >> to find pairs that overlap on opposite strands. I'm now using findOverlaps >> on a GRangesList object instead of a GRanges object - that's because I want >> to only look at cases where parts of the final spliced transcript overlap, >> not cases where a large intron-containing gene has another smaller gene >> nested in an intron. When I used findOverlaps on the entire genes at once >> as GRanges objects, my results included pairs of that nested type, but if I >> use the blocks function to get just the exons as a GRangesList object, that >> lets me successfully ignore the nested gene case, which is great. However, >> with GRangesList I can't use the query-self comparison and therefore can't >> access those useful ignoreSelf and ignoreRedundant options. I know I can >> workaround that too with some effort, but it'd be great to have it as part >> of the underlying code. >> >> Again, I've included some code below that should show what I'm trying to >> do. >> >> all the best, >> >> Janet >> >> >> library(rtracklayer) >> >> #### get some drosophila genes as a test case: >> mySession <- browserSession() >> genome(mySession) <- "dm3" >> genes <- ucscTableQuery (mySession, track="flyBaseGene", >> table="flyBaseGene") >> genes <- track(genes) >> >> #### reduce to a smaller example that contains a gene pair of the type >> I'm talking about (CG33797-RA is nested inside CG11152-RA) >> genes <- genes[148:152] >> >> #### remove strand info, as a workaround to not being able to specify >> ignore.strand for a query-self findOverlaps call >> strand(genes) <- "*" >> >> #### using findOverlaps on the genes themselves shows me the nested pair >> (query=3, subject=4) >> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) >> >> #### so I'll use blocks to extract only the exonic portions of the genes >> as a GRangesList: >> genes_GRL <- blocks(genes) >> >> #### and use findOverlaps on that GRangesList object, first by specifying >> it as both query and subject in the comparison - this gives me more or less >> what I want (i.e. it does NOT show the nested pair 3-4), except that >> there's a bunch of filtering to do later. >> findOverlaps( genes_GRL, genes_GRL) >> >> #### ideally, to help me filter the hits I'd like to be able to use >> ignoreSelf and ignoreRedundant, but I can only use those if it's a >> query-self comparison (i.e. only works if no subject is specified) >> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE) >> # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : >> # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) >> >> #### and it looks like findOverlaps is not implemented for the query-self >> case for GRangesList objects >> findOverlaps( genes_GRL) >> # Error in match.arg(type) : 'arg' must be of length 1 >> >> sessionInfo() >> >> R version 3.1.0 Patched (2014-05-26 r65771) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 >> [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 >> [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 >> [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >> [10] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.1.13 >> [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 >> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 >> [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 >> [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 >> [25] zlibbioc_1.11.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Janet, Michael, Janet, were you able to get what you needed by specifying 'type'? Michael, which method are you referring to when you say it exists but is broken? The only method I see that allows a missing 'subject' is query="Vector". Maybe we need to add query="GRangesList" subject="missing"? Val On 06/11/2014 11:15 AM, Michael Lawrence wrote: > Turns out I added something else. In this case, the method does exist, but > it's just broken. You might have better luck if you specify the type > argument to findOverlaps, like type="any". > > > On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at="" fhcrc.org=""> wrote: > >> I'm glad that's on your radar already - nice. I updated all devel >> packages this morning - it's not there yet. Here's my new sessionInfo(): >> >>> sessionInfo() >> R version 3.1.0 Patched (2014-05-26 r65771) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 GenomeInfoDb_1.1.7 >> [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 >> >> [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 >> >> [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >> >> [10] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.1.14 >> [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 >> >> [16] RCurl_1.95-4.1 Rsamtools_1.17.25 RSQLite_0.11.4 >> >> [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 >> >> [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 >> >> [25] zlibbioc_1.11.1 >> >> On Jun 10, 2014, at 3:31 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> >> wrote: >> >> I think I might have already added this a couple days ago to devel... >> tough to keep it all straight in my head >> >> >> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: >> >>> Hi again, >>> >>> This is somewhat related to another request I sent earlier this >>> afternoon. Is it possible to implement query-self comparisons (i.e. where I >>> specify query but not subject) for GRangesList objects? The motivation is >>> that I'd like to use the ignoreSelf and ignoreRedundant options in a >>> GRangesList comparison. >>> >>> I mentioned in my other request that I'm looking through a set of genes >>> to find pairs that overlap on opposite strands. I'm now using findOverlaps >>> on a GRangesList object instead of a GRanges object - that's because I want >>> to only look at cases where parts of the final spliced transcript overlap, >>> not cases where a large intron-containing gene has another smaller gene >>> nested in an intron. When I used findOverlaps on the entire genes at once >>> as GRanges objects, my results included pairs of that nested type, but if I >>> use the blocks function to get just the exons as a GRangesList object, that >>> lets me successfully ignore the nested gene case, which is great. However, >>> with GRangesList I can't use the query-self comparison and therefore can't >>> access those useful ignoreSelf and ignoreRedundant options. I know I can >>> workaround that too with some effort, but it'd be great to have it as part >>> of the underlying code. >>> >>> Again, I've included some code below that should show what I'm trying to >>> do. >>> >>> all the best, >>> >>> Janet >>> >>> >>> library(rtracklayer) >>> >>> #### get some drosophila genes as a test case: >>> mySession <- browserSession() >>> genome(mySession) <- "dm3" >>> genes <- ucscTableQuery (mySession, track="flyBaseGene", >>> table="flyBaseGene") >>> genes <- track(genes) >>> >>> #### reduce to a smaller example that contains a gene pair of the type >>> I'm talking about (CG33797-RA is nested inside CG11152-RA) >>> genes <- genes[148:152] >>> >>> #### remove strand info, as a workaround to not being able to specify >>> ignore.strand for a query-self findOverlaps call >>> strand(genes) <- "*" >>> >>> #### using findOverlaps on the genes themselves shows me the nested pair >>> (query=3, subject=4) >>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) >>> >>> #### so I'll use blocks to extract only the exonic portions of the genes >>> as a GRangesList: >>> genes_GRL <- blocks(genes) >>> >>> #### and use findOverlaps on that GRangesList object, first by specifying >>> it as both query and subject in the comparison - this gives me more or less >>> what I want (i.e. it does NOT show the nested pair 3-4), except that >>> there's a bunch of filtering to do later. >>> findOverlaps( genes_GRL, genes_GRL) >>> >>> #### ideally, to help me filter the hits I'd like to be able to use >>> ignoreSelf and ignoreRedundant, but I can only use those if it's a >>> query-self comparison (i.e. only works if no subject is specified) >>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, ignoreRedundant=TRUE) >>> # Error in .local(query, subject, maxgap, minoverlap, type, select, ...) : >>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) >>> >>> #### and it looks like findOverlaps is not implemented for the query-self >>> case for GRangesList objects >>> findOverlaps( genes_GRL) >>> # Error in match.arg(type) : 'arg' must be of length 1 >>> >>> sessionInfo() >>> >>> R version 3.1.0 Patched (2014-05-26 r65771) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 >>> [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.7.2 >>> [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 >>> [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >>> [10] fail_1.2 foreach_1.4.2 >>> GenomicAlignments_1.1.13 >>> [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 >>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 >>> [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 >>> [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 >>> [25] zlibbioc_1.11.1 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
A GRangesList is a Vector, so that method should work. It just neglected to resolve the type argument with match.arg() before forwarding. That's now fixed. Michael On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha@fhcrc.org> wrote: > Hi Janet, Michael, > > Janet, were you able to get what you needed by specifying 'type'? > > Michael, which method are you referring to when you say it exists but is > broken? The only method I see that allows a missing 'subject' is > query="Vector". Maybe we need to add query="GRangesList" subject="missing"? > > > Val > > > > On 06/11/2014 11:15 AM, Michael Lawrence wrote: > >> Turns out I added something else. In this case, the method does exist, but >> it's just broken. You might have better luck if you specify the type >> argument to findOverlaps, like type="any". >> >> >> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung@fhcrc.org> wrote: >> >> I'm glad that's on your radar already - nice. I updated all devel >>> packages this morning - it's not there yet. Here's my new sessionInfo(): >>> >>> sessionInfo() >>>> >>> R version 3.1.0 Patched (2014-05-26 r65771) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 GenomeInfoDb_1.1.7 >>> [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.6 >>> BiocParallel_0.7.2 >>> >>> [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 >>> >>> [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >>> >>> [10] fail_1.2 foreach_1.4.2 >>> GenomicAlignments_1.1.14 >>> [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 >>> >>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25 RSQLite_0.11.4 >>> >>> [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 >>> >>> [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 >>> >>> [25] zlibbioc_1.11.1 >>> >>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence <lawrence.michael@gene.com>>> > >>> wrote: >>> >>> I think I might have already added this a couple days ago to devel... >>> tough to keep it all straight in my head >>> >>> >>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young <jayoung@fhcrc.org> wrote: >>> >>> Hi again, >>>> >>>> This is somewhat related to another request I sent earlier this >>>> afternoon. Is it possible to implement query-self comparisons (i.e. >>>> where I >>>> specify query but not subject) for GRangesList objects? The >>>> motivation is >>>> that I'd like to use the ignoreSelf and ignoreRedundant options in a >>>> GRangesList comparison. >>>> >>>> I mentioned in my other request that I'm looking through a set of genes >>>> to find pairs that overlap on opposite strands. I'm now using >>>> findOverlaps >>>> on a GRangesList object instead of a GRanges object - that's because I >>>> want >>>> to only look at cases where parts of the final spliced transcript >>>> overlap, >>>> not cases where a large intron-containing gene has another smaller gene >>>> nested in an intron. When I used findOverlaps on the entire genes at >>>> once >>>> as GRanges objects, my results included pairs of that nested type, but >>>> if I >>>> use the blocks function to get just the exons as a GRangesList object, >>>> that >>>> lets me successfully ignore the nested gene case, which is great. >>>> However, >>>> with GRangesList I can't use the query-self comparison and therefore >>>> can't >>>> access those useful ignoreSelf and ignoreRedundant options. I know I >>>> can >>>> workaround that too with some effort, but it'd be great to have it as >>>> part >>>> of the underlying code. >>>> >>>> Again, I've included some code below that should show what I'm trying to >>>> do. >>>> >>>> all the best, >>>> >>>> Janet >>>> >>>> >>>> library(rtracklayer) >>>> >>>> #### get some drosophila genes as a test case: >>>> mySession <- browserSession() >>>> genome(mySession) <- "dm3" >>>> genes <- ucscTableQuery (mySession, track="flyBaseGene", >>>> table="flyBaseGene") >>>> genes <- track(genes) >>>> >>>> #### reduce to a smaller example that contains a gene pair of the type >>>> I'm talking about (CG33797-RA is nested inside CG11152-RA) >>>> genes <- genes[148:152] >>>> >>>> #### remove strand info, as a workaround to not being able to specify >>>> ignore.strand for a query-self findOverlaps call >>>> strand(genes) <- "*" >>>> >>>> #### using findOverlaps on the genes themselves shows me the nested pair >>>> (query=3, subject=4) >>>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE) >>>> >>>> #### so I'll use blocks to extract only the exonic portions of the genes >>>> as a GRangesList: >>>> genes_GRL <- blocks(genes) >>>> >>>> #### and use findOverlaps on that GRangesList object, first by >>>> specifying >>>> it as both query and subject in the comparison - this gives me more or >>>> less >>>> what I want (i.e. it does NOT show the nested pair 3-4), except that >>>> there's a bunch of filtering to do later. >>>> findOverlaps( genes_GRL, genes_GRL) >>>> >>>> #### ideally, to help me filter the hits I'd like to be able to use >>>> ignoreSelf and ignoreRedundant, but I can only use those if it's a >>>> query-self comparison (i.e. only works if no subject is specified) >>>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, >>>> ignoreRedundant=TRUE) >>>> # Error in .local(query, subject, maxgap, minoverlap, type, select, >>>> ...) : >>>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE) >>>> >>>> #### and it looks like findOverlaps is not implemented for the >>>> query-self >>>> case for GRangesList objects >>>> findOverlaps( genes_GRL) >>>> # Error in match.arg(type) : 'arg' must be of length 1 >>>> >>>> sessionInfo() >>>> >>>> R version 3.1.0 Patched (2014-05-26 r65771) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6 >>>> [4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BatchJobs_1.2 BBmisc_1.6 >>>> BiocParallel_0.7.2 >>>> [4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6 >>>> [7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4 >>>> [10] fail_1.2 foreach_1.4.2 >>>> GenomicAlignments_1.1.13 >>>> [13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2 >>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4 >>>> [19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 >>>> [22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6 >>>> [25] zlibbioc_1.11.1 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >>> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Great. Thanks. Val On 06/14/2014 10:27 PM, Michael Lawrence wrote: > A GRangesList is a Vector, so that method should work. It just neglected > to resolve the type argument with match.arg() before forwarding. That's > now fixed. > > Michael > > > On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">> wrote: > > Hi Janet, Michael, > > Janet, were you able to get what you needed by specifying 'type'? > > Michael, which method are you referring to when you say it exists > but is broken? The only method I see that allows a missing 'subject' > is query="Vector". Maybe we need to add query="GRangesList" > subject="missing"? > > > Val > > > > On 06/11/2014 11:15 AM, Michael Lawrence wrote: > > Turns out I added something else. In this case, the method does > exist, but > it's just broken. You might have better luck if you specify the type > argument to findOverlaps, like type="any". > > > On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at="" fhcrc.org=""> <mailto:jayoung at="" fhcrc.org="">> wrote: > > I'm glad that's on your radar already - nice. I updated all > devel > packages this morning - it's not there yet. Here's my new > sessionInfo(): > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils > datasets methods > [8] base > > other attached packages: > [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 > GenomeInfoDb_1.1.7 > [4] IRanges_1.99.15 S4Vectors_0.0.8 > BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 > BiocParallel_0.7.2 > > [4] Biostrings_2.33.10 bitops_1.0-6 > brew_1.0-6 > > [7] codetools_0.2-8 DBI_0.2-7 > digest_0.6.4 > > [10] fail_1.2 foreach_1.4.2 > GenomicAlignments_1.1.14 > [13] iterators_1.0.7 plyr_1.8.1 > Rcpp_0.11.2 > > [16] RCurl_1.95-4.1 Rsamtools_1.17.25 > RSQLite_0.11.4 > > [19] sendmailR_1.1-2 stats4_3.1.0 > stringr_0.6.2 > > [22] tools_3.1.0 XML_3.98-1.1 > XVector_0.5.6 > > [25] zlibbioc_1.11.1 > > On Jun 10, 2014, at 3:31 PM, Michael Lawrence > <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> > wrote: > > I think I might have already added this a couple days ago to > devel... > tough to keep it all straight in my head > > > On Tue, Jun 10, 2014 at 3:14 PM, Janet Young > <jayoung at="" fhcrc.org="" <mailto:jayoung="" at="" fhcrc.org="">> wrote: > > Hi again, > > This is somewhat related to another request I sent > earlier this > afternoon. Is it possible to implement query-self > comparisons (i.e. where I > specify query but not subject) for GRangesList objects? > The motivation is > that I'd like to use the ignoreSelf and ignoreRedundant > options in a > GRangesList comparison. > > I mentioned in my other request that I'm looking through > a set of genes > to find pairs that overlap on opposite strands. I'm now > using findOverlaps > on a GRangesList object instead of a GRanges object - > that's because I want > to only look at cases where parts of the final spliced > transcript overlap, > not cases where a large intron-containing gene has > another smaller gene > nested in an intron. When I used findOverlaps on the > entire genes at once > as GRanges objects, my results included pairs of that > nested type, but if I > use the blocks function to get just the exons as a > GRangesList object, that > lets me successfully ignore the nested gene case, which > is great. However, > with GRangesList I can't use the query-self comparison > and therefore can't > access those useful ignoreSelf and ignoreRedundant > options. I know I can > workaround that too with some effort, but it'd be great > to have it as part > of the underlying code. > > Again, I've included some code below that should show > what I'm trying to > do. > > all the best, > > Janet > > > library(rtracklayer) > > #### get some drosophila genes as a test case: > mySession <- browserSession() > genome(mySession) <- "dm3" > genes <- ucscTableQuery (mySession, track="flyBaseGene", > table="flyBaseGene") > genes <- track(genes) > > #### reduce to a smaller example that contains a gene > pair of the type > I'm talking about (CG33797-RA is nested inside CG11152-RA) > genes <- genes[148:152] > > #### remove strand info, as a workaround to not being > able to specify > ignore.strand for a query-self findOverlaps call > strand(genes) <- "*" > > #### using findOverlaps on the genes themselves shows me > the nested pair > (query=3, subject=4) > findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant > = TRUE) > > #### so I'll use blocks to extract only the exonic > portions of the genes > as a GRangesList: > genes_GRL <- blocks(genes) > > #### and use findOverlaps on that GRangesList object, > first by specifying > it as both query and subject in the comparison - this > gives me more or less > what I want (i.e. it does NOT show the nested pair 3-4), > except that > there's a bunch of filtering to do later. > findOverlaps( genes_GRL, genes_GRL) > > #### ideally, to help me filter the hits I'd like to be > able to use > ignoreSelf and ignoreRedundant, but I can only use those > if it's a > query-self comparison (i.e. only works if no subject is > specified) > findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, > ignoreRedundant=TRUE) > # Error in .local(query, subject, maxgap, minoverlap, > type, select, ...) : > # unused arguments (ignoreSelf = TRUE, ignoreRedundant > = TRUE) > > #### and it looks like findOverlaps is not implemented > for the query-self > case for GRangesList objects > findOverlaps( genes_GRL) > # Error in match.arg(type) : 'arg' must be of length 1 > > sessionInfo() > > R version 3.1.0 Patched (2014-05-26 r65771) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils > datasets methods > [8] base > > other attached packages: > [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 > GenomeInfoDb_1.1.6 > [4] IRanges_1.99.15 S4Vectors_0.0.8 > BiocGenerics_0.11.2 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.2 BBmisc_1.6 > BiocParallel_0.7.2 > [4] Biostrings_2.33.10 bitops_1.0-6 > brew_1.0-6 > [7] codetools_0.2-8 DBI_0.2-7 > digest_0.6.4 > [10] fail_1.2 foreach_1.4.2 > GenomicAlignments_1.1.13 > [13] iterators_1.0.7 plyr_1.8.1 > Rcpp_0.11.2 > [16] RCurl_1.95-4.1 Rsamtools_1.17.23 > RSQLite_0.11.4 > [19] sendmailR_1.1-2 stats4_3.1.0 > stringr_0.6.2 > [22] tools_3.1.0 XML_3.98-1.1 > XVector_0.5.6 > [25] zlibbioc_1.11.1 > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > > > [[alternative HTML version deleted]] > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > >
ADD REPLY
0
Entering edit mode
Thanks very much all - yes, I can now get what I want by adding type="any": findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE) I guess in the long term that'll be unnecessary? It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods. Janet On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > Great. Thanks. > > Val > > On 06/14/2014 10:27 PM, Michael Lawrence wrote: >> A GRangesList is a Vector, so that method should work. It just neglected >> to resolve the type argument with match.arg() before forwarding. That's >> now fixed. >> >> Michael >> >> >> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">> <mailto:vobencha at="" fhcrc.org="">> wrote: >> >> Hi Janet, Michael, >> >> Janet, were you able to get what you needed by specifying 'type'? >> >> Michael, which method are you referring to when you say it exists >> but is broken? The only method I see that allows a missing 'subject' >> is query="Vector". Maybe we need to add query="GRangesList" >> subject="missing"? >> >> >> Val >> >> >> >> On 06/11/2014 11:15 AM, Michael Lawrence wrote: >> >> Turns out I added something else. In this case, the method does >> exist, but >> it's just broken. You might have better luck if you specify the type >> argument to findOverlaps, like type="any". >> >> >> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at="" fhcrc.org="">> <mailto:jayoung at="" fhcrc.org="">> wrote: >> >> I'm glad that's on your radar already - nice. I updated all >> devel >> packages this morning - it's not there yet. Here's my new >> sessionInfo(): >> >> sessionInfo() >> >> R version 3.1.0 Patched (2014-05-26 r65771) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils >> datasets methods >> [8] base >> >> other attached packages: >> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 >> GenomeInfoDb_1.1.7 >> [4] IRanges_1.99.15 S4Vectors_0.0.8 >> BiocGenerics_0.11.2 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.6 >> BiocParallel_0.7.2 >> >> [4] Biostrings_2.33.10 bitops_1.0-6 >> brew_1.0-6 >> >> [7] codetools_0.2-8 DBI_0.2-7 >> digest_0.6.4 >> >> [10] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.1.14 >> [13] iterators_1.0.7 plyr_1.8.1 >> Rcpp_0.11.2 >> >> [16] RCurl_1.95-4.1 Rsamtools_1.17.25 >> RSQLite_0.11.4 >> >> [19] sendmailR_1.1-2 stats4_3.1.0 >> stringr_0.6.2 >> >> [22] tools_3.1.0 XML_3.98-1.1 >> XVector_0.5.6 >> >> [25] zlibbioc_1.11.1 >> >> On Jun 10, 2014, at 3:31 PM, Michael Lawrence >> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> >> wrote: >> >> I think I might have already added this a couple days ago to >> devel... >> tough to keep it all straight in my head >> >> >> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young >> <jayoung at="" fhcrc.org="" <mailto:jayoung="" at="" fhcrc.org="">> wrote: >> >> Hi again, >> >> This is somewhat related to another request I sent >> earlier this >> afternoon. Is it possible to implement query-self >> comparisons (i.e. where I >> specify query but not subject) for GRangesList objects? >> The motivation is >> that I'd like to use the ignoreSelf and ignoreRedundant >> options in a >> GRangesList comparison. >> >> I mentioned in my other request that I'm looking through >> a set of genes >> to find pairs that overlap on opposite strands. I'm now >> using findOverlaps >> on a GRangesList object instead of a GRanges object - >> that's because I want >> to only look at cases where parts of the final spliced >> transcript overlap, >> not cases where a large intron-containing gene has >> another smaller gene >> nested in an intron. When I used findOverlaps on the >> entire genes at once >> as GRanges objects, my results included pairs of that >> nested type, but if I >> use the blocks function to get just the exons as a >> GRangesList object, that >> lets me successfully ignore the nested gene case, which >> is great. However, >> with GRangesList I can't use the query-self comparison >> and therefore can't >> access those useful ignoreSelf and ignoreRedundant >> options. I know I can >> workaround that too with some effort, but it'd be great >> to have it as part >> of the underlying code. >> >> Again, I've included some code below that should show >> what I'm trying to >> do. >> >> all the best, >> >> Janet >> >> >> library(rtracklayer) >> >> #### get some drosophila genes as a test case: >> mySession <- browserSession() >> genome(mySession) <- "dm3" >> genes <- ucscTableQuery (mySession, track="flyBaseGene", >> table="flyBaseGene") >> genes <- track(genes) >> >> #### reduce to a smaller example that contains a gene >> pair of the type >> I'm talking about (CG33797-RA is nested inside CG11152-RA) >> genes <- genes[148:152] >> >> #### remove strand info, as a workaround to not being >> able to specify >> ignore.strand for a query-self findOverlaps call >> strand(genes) <- "*" >> >> #### using findOverlaps on the genes themselves shows me >> the nested pair >> (query=3, subject=4) >> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant >> = TRUE) >> >> #### so I'll use blocks to extract only the exonic >> portions of the genes >> as a GRangesList: >> genes_GRL <- blocks(genes) >> >> #### and use findOverlaps on that GRangesList object, >> first by specifying >> it as both query and subject in the comparison - this >> gives me more or less >> what I want (i.e. it does NOT show the nested pair 3-4), >> except that >> there's a bunch of filtering to do later. >> findOverlaps( genes_GRL, genes_GRL) >> >> #### ideally, to help me filter the hits I'd like to be >> able to use >> ignoreSelf and ignoreRedundant, but I can only use those >> if it's a >> query-self comparison (i.e. only works if no subject is >> specified) >> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, >> ignoreRedundant=TRUE) >> # Error in .local(query, subject, maxgap, minoverlap, >> type, select, ...) : >> # unused arguments (ignoreSelf = TRUE, ignoreRedundant >> = TRUE) >> >> #### and it looks like findOverlaps is not implemented >> for the query-self >> case for GRangesList objects >> findOverlaps( genes_GRL) >> # Error in match.arg(type) : 'arg' must be of length 1 >> >> sessionInfo() >> >> R version 3.1.0 Patched (2014-05-26 r65771) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils >> datasets methods >> [8] base >> >> other attached packages: >> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 >> GenomeInfoDb_1.1.6 >> [4] IRanges_1.99.15 S4Vectors_0.0.8 >> BiocGenerics_0.11.2 >> >> loaded via a namespace (and not attached): >> [1] BatchJobs_1.2 BBmisc_1.6 >> BiocParallel_0.7.2 >> [4] Biostrings_2.33.10 bitops_1.0-6 >> brew_1.0-6 >> [7] codetools_0.2-8 DBI_0.2-7 >> digest_0.6.4 >> [10] fail_1.2 foreach_1.4.2 >> GenomicAlignments_1.1.13 >> [13] iterators_1.0.7 plyr_1.8.1 >> Rcpp_0.11.2 >> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 >> RSQLite_0.11.4 >> [19] sendmailR_1.1-2 stats4_3.1.0 >> stringr_0.6.2 >> [22] tools_3.1.0 XML_3.98-1.1 >> XVector_0.5.6 >> [25] zlibbioc_1.11.1 >> >> _________________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/__listinfo/bioconductor >> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _________________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/__listinfo/bioconductor >> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> >> >> >
ADD REPLY
0
Entering edit mode
I think Michael forgot to bump the version after his fix so the package didn't propagate. It's bumped now and 1.99.16 should be available w/ biocLite() by noon tomorrow or in svn immediately. findOverlaps() now works on a GRangesList w/ no subject and the ignore* args as necessary. The default for 'type' is "any" for all findOverlaps() methods. example(GRangesList) ## for 'grl' >> findOverlaps(grl) > Hits of length 3 > queryLength: 3 > subjectLength: 3 > queryHits subjectHits > <integer> <integer> > 1 1 1 > 2 2 2 > 3 3 3 Valerie On 06/17/2014 11:54 AM, Janet Young wrote: > Thanks very much all - yes, I can now get what I want by adding type="any": > findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE) > I guess in the long term that'll be unnecessary? It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods. > > Janet > > > > > On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > >> Great. Thanks. >> >> Val >> >> On 06/14/2014 10:27 PM, Michael Lawrence wrote: >>> A GRangesList is a Vector, so that method should work. It just neglected >>> to resolve the type argument with match.arg() before forwarding. That's >>> now fixed. >>> >>> Michael >>> >>> >>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">>> <mailto:vobencha at="" fhcrc.org="">> wrote: >>> >>> Hi Janet, Michael, >>> >>> Janet, were you able to get what you needed by specifying 'type'? >>> >>> Michael, which method are you referring to when you say it exists >>> but is broken? The only method I see that allows a missing 'subject' >>> is query="Vector". Maybe we need to add query="GRangesList" >>> subject="missing"? >>> >>> >>> Val >>> >>> >>> >>> On 06/11/2014 11:15 AM, Michael Lawrence wrote: >>> >>> Turns out I added something else. In this case, the method does >>> exist, but >>> it's just broken. You might have better luck if you specify the type >>> argument to findOverlaps, like type="any". >>> >>> >>> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at="" fhcrc.org="">>> <mailto:jayoung at="" fhcrc.org="">> wrote: >>> >>> I'm glad that's on your radar already - nice. I updated all >>> devel >>> packages this morning - it's not there yet. Here's my new >>> sessionInfo(): >>> >>> sessionInfo() >>> >>> R version 3.1.0 Patched (2014-05-26 r65771) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils >>> datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 >>> GenomeInfoDb_1.1.7 >>> [4] IRanges_1.99.15 S4Vectors_0.0.8 >>> BiocGenerics_0.11.2 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.6 >>> BiocParallel_0.7.2 >>> >>> [4] Biostrings_2.33.10 bitops_1.0-6 >>> brew_1.0-6 >>> >>> [7] codetools_0.2-8 DBI_0.2-7 >>> digest_0.6.4 >>> >>> [10] fail_1.2 foreach_1.4.2 >>> GenomicAlignments_1.1.14 >>> [13] iterators_1.0.7 plyr_1.8.1 >>> Rcpp_0.11.2 >>> >>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25 >>> RSQLite_0.11.4 >>> >>> [19] sendmailR_1.1-2 stats4_3.1.0 >>> stringr_0.6.2 >>> >>> [22] tools_3.1.0 XML_3.98-1.1 >>> XVector_0.5.6 >>> >>> [25] zlibbioc_1.11.1 >>> >>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence >>> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> >>> wrote: >>> >>> I think I might have already added this a couple days ago to >>> devel... >>> tough to keep it all straight in my head >>> >>> >>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young >>> <jayoung at="" fhcrc.org="" <mailto:jayoung="" at="" fhcrc.org="">> wrote: >>> >>> Hi again, >>> >>> This is somewhat related to another request I sent >>> earlier this >>> afternoon. Is it possible to implement query-self >>> comparisons (i.e. where I >>> specify query but not subject) for GRangesList objects? >>> The motivation is >>> that I'd like to use the ignoreSelf and ignoreRedundant >>> options in a >>> GRangesList comparison. >>> >>> I mentioned in my other request that I'm looking through >>> a set of genes >>> to find pairs that overlap on opposite strands. I'm now >>> using findOverlaps >>> on a GRangesList object instead of a GRanges object - >>> that's because I want >>> to only look at cases where parts of the final spliced >>> transcript overlap, >>> not cases where a large intron-containing gene has >>> another smaller gene >>> nested in an intron. When I used findOverlaps on the >>> entire genes at once >>> as GRanges objects, my results included pairs of that >>> nested type, but if I >>> use the blocks function to get just the exons as a >>> GRangesList object, that >>> lets me successfully ignore the nested gene case, which >>> is great. However, >>> with GRangesList I can't use the query-self comparison >>> and therefore can't >>> access those useful ignoreSelf and ignoreRedundant >>> options. I know I can >>> workaround that too with some effort, but it'd be great >>> to have it as part >>> of the underlying code. >>> >>> Again, I've included some code below that should show >>> what I'm trying to >>> do. >>> >>> all the best, >>> >>> Janet >>> >>> >>> library(rtracklayer) >>> >>> #### get some drosophila genes as a test case: >>> mySession <- browserSession() >>> genome(mySession) <- "dm3" >>> genes <- ucscTableQuery (mySession, track="flyBaseGene", >>> table="flyBaseGene") >>> genes <- track(genes) >>> >>> #### reduce to a smaller example that contains a gene >>> pair of the type >>> I'm talking about (CG33797-RA is nested inside CG11152-RA) >>> genes <- genes[148:152] >>> >>> #### remove strand info, as a workaround to not being >>> able to specify >>> ignore.strand for a query-self findOverlaps call >>> strand(genes) <- "*" >>> >>> #### using findOverlaps on the genes themselves shows me >>> the nested pair >>> (query=3, subject=4) >>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant >>> = TRUE) >>> >>> #### so I'll use blocks to extract only the exonic >>> portions of the genes >>> as a GRangesList: >>> genes_GRL <- blocks(genes) >>> >>> #### and use findOverlaps on that GRangesList object, >>> first by specifying >>> it as both query and subject in the comparison - this >>> gives me more or less >>> what I want (i.e. it does NOT show the nested pair 3-4), >>> except that >>> there's a bunch of filtering to do later. >>> findOverlaps( genes_GRL, genes_GRL) >>> >>> #### ideally, to help me filter the hits I'd like to be >>> able to use >>> ignoreSelf and ignoreRedundant, but I can only use those >>> if it's a >>> query-self comparison (i.e. only works if no subject is >>> specified) >>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, >>> ignoreRedundant=TRUE) >>> # Error in .local(query, subject, maxgap, minoverlap, >>> type, select, ...) : >>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant >>> = TRUE) >>> >>> #### and it looks like findOverlaps is not implemented >>> for the query-self >>> case for GRangesList objects >>> findOverlaps( genes_GRL) >>> # Error in match.arg(type) : 'arg' must be of length 1 >>> >>> sessionInfo() >>> >>> R version 3.1.0 Patched (2014-05-26 r65771) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils >>> datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 >>> GenomeInfoDb_1.1.6 >>> [4] IRanges_1.99.15 S4Vectors_0.0.8 >>> BiocGenerics_0.11.2 >>> >>> loaded via a namespace (and not attached): >>> [1] BatchJobs_1.2 BBmisc_1.6 >>> BiocParallel_0.7.2 >>> [4] Biostrings_2.33.10 bitops_1.0-6 >>> brew_1.0-6 >>> [7] codetools_0.2-8 DBI_0.2-7 >>> digest_0.6.4 >>> [10] fail_1.2 foreach_1.4.2 >>> GenomicAlignments_1.1.13 >>> [13] iterators_1.0.7 plyr_1.8.1 >>> Rcpp_0.11.2 >>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 >>> RSQLite_0.11.4 >>> [19] sendmailR_1.1-2 stats4_3.1.0 >>> stringr_0.6.2 >>> [22] tools_3.1.0 XML_3.98-1.1 >>> XVector_0.5.6 >>> [25] zlibbioc_1.11.1 >>> >>> _________________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/__listinfo/bioconductor >>> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> _________________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/__listinfo/bioconductor >>> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >>> >> >
ADD REPLY
0
Entering edit mode
thanks very much - got it now, and it's working without the type="any". Nice! j On Jun 18, 2014, at 1:50 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > I think Michael forgot to bump the version after his fix so the package didn't propagate. It's bumped now and 1.99.16 should be available w/ biocLite() by noon tomorrow or in svn immediately. > > findOverlaps() now works on a GRangesList w/ no subject and the ignore* args as necessary. The default for 'type' is "any" for all findOverlaps() methods. > > example(GRangesList) ## for 'grl' >>> findOverlaps(grl) >> Hits of length 3 >> queryLength: 3 >> subjectLength: 3 >> queryHits subjectHits >> <integer> <integer> >> 1 1 1 >> 2 2 2 >> 3 3 3 > > > Valerie > > On 06/17/2014 11:54 AM, Janet Young wrote: >> Thanks very much all - yes, I can now get what I want by adding type="any": >> findOverlaps( genes_GRL, type="any", ignoreSelf=TRUE, ignoreRedundant=TRUE) >> I guess in the long term that'll be unnecessary? > It seems to make sense to have a default of type="any", to be consistent with all the other findOverlaps methods. >> >> Janet >> >> >> >> >> On Jun 15, 2014, at 12:35 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: >> >>> Great. Thanks. >>> >>> Val >>> >>> On 06/14/2014 10:27 PM, Michael Lawrence wrote: >>>> A GRangesList is a Vector, so that method should work. It just neglected >>>> to resolve the type argument with match.arg() before forwarding. That's >>>> now fixed. >>>> >>>> Michael >>>> >>>> >>>> On Sat, Jun 14, 2014 at 6:27 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">>>> <mailto:vobencha at="" fhcrc.org="">> wrote: >>>> >>>> Hi Janet, Michael, >>>> >>>> Janet, were you able to get what you needed by specifying 'type'? >>>> >>>> Michael, which method are you referring to when you say it exists >>>> but is broken? The only method I see that allows a missing 'subject' >>>> is query="Vector". Maybe we need to add query="GRangesList" >>>> subject="missing"? >>>> >>>> >>>> Val >>>> >>>> >>>> >>>> On 06/11/2014 11:15 AM, Michael Lawrence wrote: >>>> >>>> Turns out I added something else. In this case, the method does >>>> exist, but >>>> it's just broken. You might have better luck if you specify the type >>>> argument to findOverlaps, like type="any". >>>> >>>> >>>> On Wed, Jun 11, 2014 at 10:52 AM, Janet Young <jayoung at="" fhcrc.org="">>>> <mailto:jayoung at="" fhcrc.org="">> wrote: >>>> >>>> I'm glad that's on your radar already - nice. I updated all >>>> devel >>>> packages this morning - it's not there yet. Here's my new >>>> sessionInfo(): >>>> >>>> sessionInfo() >>>> >>>> R version 3.1.0 Patched (2014-05-26 r65771) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils >>>> datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.18 >>>> GenomeInfoDb_1.1.7 >>>> [4] IRanges_1.99.15 S4Vectors_0.0.8 >>>> BiocGenerics_0.11.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BatchJobs_1.2 BBmisc_1.6 >>>> BiocParallel_0.7.2 >>>> >>>> [4] Biostrings_2.33.10 bitops_1.0-6 >>>> brew_1.0-6 >>>> >>>> [7] codetools_0.2-8 DBI_0.2-7 >>>> digest_0.6.4 >>>> >>>> [10] fail_1.2 foreach_1.4.2 >>>> GenomicAlignments_1.1.14 >>>> [13] iterators_1.0.7 plyr_1.8.1 >>>> Rcpp_0.11.2 >>>> >>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.25 >>>> RSQLite_0.11.4 >>>> >>>> [19] sendmailR_1.1-2 stats4_3.1.0 >>>> stringr_0.6.2 >>>> >>>> [22] tools_3.1.0 XML_3.98-1.1 >>>> XVector_0.5.6 >>>> >>>> [25] zlibbioc_1.11.1 >>>> >>>> On Jun 10, 2014, at 3:31 PM, Michael Lawrence >>>> <lawrence.michael at="" gene.com="" <mailto:lawrence.michael="" at="" gene.com="">> >>>> wrote: >>>> >>>> I think I might have already added this a couple days ago to >>>> devel... >>>> tough to keep it all straight in my head >>>> >>>> >>>> On Tue, Jun 10, 2014 at 3:14 PM, Janet Young >>>> <jayoung at="" fhcrc.org="" <mailto:jayoung="" at="" fhcrc.org="">> wrote: >>>> >>>> Hi again, >>>> >>>> This is somewhat related to another request I sent >>>> earlier this >>>> afternoon. Is it possible to implement query-self >>>> comparisons (i.e. where I >>>> specify query but not subject) for GRangesList objects? >>>> The motivation is >>>> that I'd like to use the ignoreSelf and ignoreRedundant >>>> options in a >>>> GRangesList comparison. >>>> >>>> I mentioned in my other request that I'm looking through >>>> a set of genes >>>> to find pairs that overlap on opposite strands. I'm now >>>> using findOverlaps >>>> on a GRangesList object instead of a GRanges object - >>>> that's because I want >>>> to only look at cases where parts of the final spliced >>>> transcript overlap, >>>> not cases where a large intron-containing gene has >>>> another smaller gene >>>> nested in an intron. When I used findOverlaps on the >>>> entire genes at once >>>> as GRanges objects, my results included pairs of that >>>> nested type, but if I >>>> use the blocks function to get just the exons as a >>>> GRangesList object, that >>>> lets me successfully ignore the nested gene case, which >>>> is great. However, >>>> with GRangesList I can't use the query-self comparison >>>> and therefore can't >>>> access those useful ignoreSelf and ignoreRedundant >>>> options. I know I can >>>> workaround that too with some effort, but it'd be great >>>> to have it as part >>>> of the underlying code. >>>> >>>> Again, I've included some code below that should show >>>> what I'm trying to >>>> do. >>>> >>>> all the best, >>>> >>>> Janet >>>> >>>> >>>> library(rtracklayer) >>>> >>>> #### get some drosophila genes as a test case: >>>> mySession <- browserSession() >>>> genome(mySession) <- "dm3" >>>> genes <- ucscTableQuery (mySession, track="flyBaseGene", >>>> table="flyBaseGene") >>>> genes <- track(genes) >>>> >>>> #### reduce to a smaller example that contains a gene >>>> pair of the type >>>> I'm talking about (CG33797-RA is nested inside CG11152-RA) >>>> genes <- genes[148:152] >>>> >>>> #### remove strand info, as a workaround to not being >>>> able to specify >>>> ignore.strand for a query-self findOverlaps call >>>> strand(genes) <- "*" >>>> >>>> #### using findOverlaps on the genes themselves shows me >>>> the nested pair >>>> (query=3, subject=4) >>>> findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant >>>> = TRUE) >>>> >>>> #### so I'll use blocks to extract only the exonic >>>> portions of the genes >>>> as a GRangesList: >>>> genes_GRL <- blocks(genes) >>>> >>>> #### and use findOverlaps on that GRangesList object, >>>> first by specifying >>>> it as both query and subject in the comparison - this >>>> gives me more or less >>>> what I want (i.e. it does NOT show the nested pair 3-4), >>>> except that >>>> there's a bunch of filtering to do later. >>>> findOverlaps( genes_GRL, genes_GRL) >>>> >>>> #### ideally, to help me filter the hits I'd like to be >>>> able to use >>>> ignoreSelf and ignoreRedundant, but I can only use those >>>> if it's a >>>> query-self comparison (i.e. only works if no subject is >>>> specified) >>>> findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE, >>>> ignoreRedundant=TRUE) >>>> # Error in .local(query, subject, maxgap, minoverlap, >>>> type, select, ...) : >>>> # unused arguments (ignoreSelf = TRUE, ignoreRedundant >>>> = TRUE) >>>> >>>> #### and it looks like findOverlaps is not implemented >>>> for the query-self >>>> case for GRangesList objects >>>> findOverlaps( genes_GRL) >>>> # Error in match.arg(type) : 'arg' must be of length 1 >>>> >>>> sessionInfo() >>>> >>>> R version 3.1.0 Patched (2014-05-26 r65771) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils >>>> datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] rtracklayer_1.25.11 GenomicRanges_1.17.17 >>>> GenomeInfoDb_1.1.6 >>>> [4] IRanges_1.99.15 S4Vectors_0.0.8 >>>> BiocGenerics_0.11.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] BatchJobs_1.2 BBmisc_1.6 >>>> BiocParallel_0.7.2 >>>> [4] Biostrings_2.33.10 bitops_1.0-6 >>>> brew_1.0-6 >>>> [7] codetools_0.2-8 DBI_0.2-7 >>>> digest_0.6.4 >>>> [10] fail_1.2 foreach_1.4.2 >>>> GenomicAlignments_1.1.13 >>>> [13] iterators_1.0.7 plyr_1.8.1 >>>> Rcpp_0.11.2 >>>> [16] RCurl_1.95-4.1 Rsamtools_1.17.23 >>>> RSQLite_0.11.4 >>>> [19] sendmailR_1.1-2 stats4_3.1.0 >>>> stringr_0.6.2 >>>> [22] tools_3.1.0 XML_3.98-1.1 >>>> XVector_0.5.6 >>>> [25] zlibbioc_1.11.1 >>>> >>>> _________________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor >>>> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _________________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/__listinfo/bioconductor >>>> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >>>> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>>> >>> >> >
ADD REPLY

Login before adding your answer.

Traffic: 753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6