Entering edit mode
Janet Young
▴
740
@janet-young-2360
Last seen 5.1 years ago
Fred Hutchinson Cancer Research Center,…
Hi again,
This is somewhat related to another request I sent earlier this
afternoon. Is it possible to implement query-self comparisons (i.e.
where I specify query but not subject) for GRangesList objects? The
motivation is that I'd like to use the ignoreSelf and ignoreRedundant
options in a GRangesList comparison.
I mentioned in my other request that I'm looking through a set of
genes to find pairs that overlap on opposite strands. I'm now using
findOverlaps on a GRangesList object instead of a GRanges object -
that's because I want to only look at cases where parts of the final
spliced transcript overlap, not cases where a large intron-containing
gene has another smaller gene nested in an intron. When I used
findOverlaps on the entire genes at once as GRanges objects, my
results included pairs of that nested type, but if I use the blocks
function to get just the exons as a GRangesList object, that lets me
successfully ignore the nested gene case, which is great. However,
with GRangesList I can't use the query-self comparison and therefore
can't access those useful ignoreSelf and ignoreRedundant options. I
know I can workaround that too with some effort, but it'd be great to
have it as part of the underlying code.
Again, I've included some code below that should show what I'm trying
to do.
all the best,
Janet
library(rtracklayer)
#### get some drosophila genes as a test case:
mySession <- browserSession()
genome(mySession) <- "dm3"
genes <- ucscTableQuery (mySession, track="flyBaseGene",
table="flyBaseGene")
genes <- track(genes)
#### reduce to a smaller example that contains a gene pair of the type
I'm talking about (CG33797-RA is nested inside CG11152-RA)
genes <- genes[148:152]
#### remove strand info, as a workaround to not being able to specify
ignore.strand for a query-self findOverlaps call
strand(genes) <- "*"
#### using findOverlaps on the genes themselves shows me the nested
pair (query=3, subject=4)
findOverlaps( genes, ignoreSelf = TRUE, ignoreRedundant = TRUE)
#### so I'll use blocks to extract only the exonic portions of the
genes as a GRangesList:
genes_GRL <- blocks(genes)
#### and use findOverlaps on that GRangesList object, first by
specifying it as both query and subject in the comparison - this gives
me more or less what I want (i.e. it does NOT show the nested pair
3-4), except that there's a bunch of filtering to do later.
findOverlaps( genes_GRL, genes_GRL)
#### ideally, to help me filter the hits I'd like to be able to use
ignoreSelf and ignoreRedundant, but I can only use those if it's a
query-self comparison (i.e. only works if no subject is specified)
findOverlaps( genes_GRL, genes_GRL, ignoreSelf=TRUE,
ignoreRedundant=TRUE)
# Error in .local(query, subject, maxgap, minoverlap, type, select,
...) :
# unused arguments (ignoreSelf = TRUE, ignoreRedundant = TRUE)
#### and it looks like findOverlaps is not implemented for the query-
self case for GRangesList objects
findOverlaps( genes_GRL)
# Error in match.arg(type) : 'arg' must be of length 1
sessionInfo()
R version 3.1.0 Patched (2014-05-26 r65771)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
[8] base
other attached packages:
[1] rtracklayer_1.25.11 GenomicRanges_1.17.17 GenomeInfoDb_1.1.6
[4] IRanges_1.99.15 S4Vectors_0.0.8 BiocGenerics_0.11.2
loaded via a namespace (and not attached):
[1] BatchJobs_1.2 BBmisc_1.6
BiocParallel_0.7.2
[4] Biostrings_2.33.10 bitops_1.0-6 brew_1.0-6
[7] codetools_0.2-8 DBI_0.2-7 digest_0.6.4
[10] fail_1.2 foreach_1.4.2
GenomicAlignments_1.1.13
[13] iterators_1.0.7 plyr_1.8.1 Rcpp_0.11.2
[16] RCurl_1.95-4.1 Rsamtools_1.17.23 RSQLite_0.11.4
[19] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2
[22] tools_3.1.0 XML_3.98-1.1 XVector_0.5.6
[25] zlibbioc_1.11.1