Entering edit mode
I have a GRanges object called "MmOrganismDbTxs" that has a CharacterList of UCSC-style names as one of the metadata columns, and I have a vector of UCSC-style names that I'm interested in called "desiredUCnames". I would like to subset my GRanges object to pull out a GRanges object that contains only the genes I'm interested in. The line at the bottom of this code will do it, but it seems rather long and awkward. Is there a better way of subsetting?
library("Mus.musculus") MmOrganismDb <- Mus.musculus MmOrganismDbTxs <- transcripts(MmOrganismDb) > head(MmOrganismDbTxs) GRanges object with 6 ranges and 2 metadata columns: seqnames ranges strand | TXID TXNAME <Rle> <IRanges> <Rle> | <IntegerList> <CharacterList> [1] chr1 [4807893, 4842827] + | 1 uc007afg.1 [2] chr1 [4807893, 4846735] + | 2 uc007afh.1 [3] chr1 [4857694, 4897909] + | 3 uc007afi.2 [4] chr1 [4857694, 4897909] + | 4 uc011wht.1 [5] chr1 [4858328, 4897909] + | 5 uc011whu.1 [6] chr1 [5083173, 5099777] + | 6 uc007afm.1 ------- seqinfo: 66 sequences (1 circular) from mm10 genome > desiredUCnames [1] "uc007pac.1" "uc007puq.3" "uc009tzo.1" "uc009tzp.1" "uc007pur.1" desiredTxs <- MmOrganismDbTxs[as.logical(elementMetadata(MmOrganismDbTxs)$TXNAME %in% desiredUCnames)] Thank you.
Eric
Hi Michael,
That's a big improvement - thanks! One follow up question: In your "any" statement, it looks like you can use "TXNAME" without specifying that "TXNAME" is part of "MmOrganismDbTxs". Why is this?
Thanks again for the help.
Eric
The second argument to
subset()
is evaluated in the "context" ofMmOrganismDbTxs
. Just like thesubset()
in the base package does with data.frame. Note that there are some gotchas with lazy evaluation, but it is convenient for interactive/casual use. Passingstrict=TRUE
tosubset()
can help guard against some mistakes, but it requires a stricter syntax where "global" symbols likedesiredUCnames
escaped, as in:desiredTxs <- subset(MmOrganismDbTxs, any(TXNAME %in% .(desiredUCnames)), strict=TRUE)
Thanks very much. I appreciate it.
Eric