I'm struggling to find a fast way to access the metadata associated with a GRangesList. I can explicitly invoke:
lapply (GRangesList, mcols)
but that is very slow. Instead, I'm hoping to do something like
GRangesList$exon_rank
in order to get an IntegerList containing that data. (I then want to perform downstream operations on the list.)
More specifically, I want to extract a list of exon ranks from a TxDb object. The following code works:
refSeqDb = suppressWarnings (makeTranscriptDbFromUCSC (
"hg19",
tablename = "refGene"))
refseq2exons = exonsBy (refSeqDb, by = "tx")
refseq2exons = refseq2exons[, "exon_rank"]
exonRankList = lapply (lapply (refseq2exons, mcols), "[[", 1)
However, the final step--involving multiple calls to lapply--is extremely slow.
Hi Martin, Robert,
Note that using the unlist/relist approach will always work and do the right thing for this kind of situation. It is therefore the recommended idiom. FWIW the unlist/split approach has the following pitfalls:
refseq2exons
in your case) has unique names. This won't always be the case e.g. if you useexonsBy()
withuse.names=TRUE
to obtainrefseq2exons
.unlist()
will do some strange name mangling in order to "blend" the inner names with the outer names. Then splitting based on the names will likely give an incorrect result.unlist()
will mangle the outer names in a silly way. Then again, splitting based on the names will likely give an incorrect result.exonRankList
list) will generally not be parallel to the original list-like object, parallel meaning that the 2 objects have the same length and the i-th element in one corresponds to the i-th element in the other. For example, in your case the list elements in exonRankList are in a different order than inrefseq2exons
.The unlist/relist approach has none of these problems, that is, it will always produce a result that is parallel to and has the same shape as the original object. It's also slightly more efficient than the unlist/split approach.
Cheers,
H.