I have a Granges object that was generated using some of the really nice info from this page (Mapping genome regions to gene symbols). I'm finding overlaps between my query Granges and my subject Granges (Homo.sapiens) and assigning gene symbols to the given locus. However when two genes overlap the same locus you get something like this.
seqnames ranges strand | numBC SYMBOL
<Rle> <IRanges> <Rle> | <integer> <CharacterList>
[1] chr12 [122692988, 122693157] * | 174 DIABLO,VPS33A
[2] chr12 [122693161, 122693336] * | 167 DIABLO,VPS33A
[3] chr12 [122694166, 122694413] * | 133 DIABLO,VPS33A
Using the script
grange_test<- makeGRangesFromDataFrame(bc_test, keep.extra.columns=TRUE)
symInCnv_test = splitColumnByOverlap(hs, grange_test, "SYMBOL")
grange_test$SYMBOL <- symInCnv_test
However, the function
splitColumnByOverlap <-
function(query, subject, column="ENTREZID", ...)
{
olaps <- findOverlaps(query, subject, ...)
f1 <- factor(subjectHits(olaps),
levels=seq_len(subjectLength(olaps)))
splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}
creates a character list for the gene symbol. For a variety of reasons I actually need each gene to be in a new line as seen below.
seqnames ranges strand | numBC SYMBOL
<Rle> <IRanges> <Rle> | <integer> <Character>
[1] chr12 [122692988, 122693157] * | 174 DIABLO
[2] chr12 [122692988, 122693157] * | 174 VPS33A
[3] chr12 [122693161, 122693336] * | 167 DIABLO
[4] chr12 [122693161, 122693336] * | 167 VPS33A
[5] chr12 [122694166, 122694413] * | 133 DIABLO
[6] chr12 [122694166, 122694413] * | 133 VPS33A
Can anyone think of a way to do this (GenomicRanges, fix splitColumnByOverlap(), tidy, or otherwise)?
I've tried making my ending Granges a data.frame and splitting a variety of ways but nothing gets me where I need to be. Any help would be greatly appreciated.
Thanks.
Thanks for the reply but this does not work.
Gives
Why are you coercing to a data frame first?
expand does not seem to work with Granges
I've gotten fairly close using
But the resulting "SYMBOL" column has a bunch of left over characters that I'm having a hard time removing
Success! Your method worked but you have to use
not
or
Depending on the context, of course.