Splitting lines of a GRanges object based on character list
1
0
Entering edit mode
@stephenwilliams-15198
Last seen 6.3 years ago

I have a Granges object that was generated using some of the really nice info from this page (Mapping genome regions to gene symbols). I'm finding overlaps between my query Granges and my subject Granges (Homo.sapiens) and assigning gene symbols to the given locus. However when two genes overlap the same locus you get something like this. 

 

     seqnames                 ranges strand |     numBC    SYMBOL
         <Rle>              <IRanges>  <Rle> | <integer>    <CharacterList>
  [1]    chr12 [122692988, 122693157]      * |       174    DIABLO,VPS33A
  [2]    chr12 [122693161, 122693336]      * |       167    DIABLO,VPS33A
  [3]    chr12 [122694166, 122694413]      * |       133    DIABLO,VPS33A

 

Using the script

grange_test<- makeGRangesFromDataFrame(bc_test, keep.extra.columns=TRUE)
symInCnv_test = splitColumnByOverlap(hs, grange_test, "SYMBOL")
grange_test$SYMBOL <- symInCnv_test

 

However, the function 

splitColumnByOverlap <-
    function(query, subject, column="ENTREZID", ...)
{
    olaps <- findOverlaps(query, subject, ...)
    f1 <- factor(subjectHits(olaps),
                 levels=seq_len(subjectLength(olaps)))
    splitAsList(mcols(query)[[column]][queryHits(olaps)], f1)
}

creates a character list for the gene symbol. For a variety of reasons I actually need each gene to be in a new line as seen below. 

seqnames                 ranges strand |     numBC    SYMBOL
         <Rle>              <IRanges>  <Rle> | <integer>    <Character>
  [1]    chr12 [122692988, 122693157]      * |       174    DIABLO
  [2]    chr12 [122692988, 122693157]      * |       174    VPS33A
  [3]    chr12 [122693161, 122693336]      * |       167    DIABLO
  [4]    chr12 [122693161, 122693336]      * |       167    VPS33A
  [5]    chr12 [122694166, 122694413]      * |       133    DIABLO
  [6]    chr12 [122694166, 122694413]      * |       133    VPS33A

Can anyone think of a way to do this (GenomicRanges, fix  splitColumnByOverlap(), tidy, or otherwise)?

I've tried making my ending Granges a data.frame and splitting a variety of ways but nothing gets me where I need to be. Any help would be greatly appreciated. 

Thanks.

granges grangeslist • 2.0k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States
expand(grange_test, "SYMBOL")
ADD COMMENT
0
Entering edit mode

Thanks for the reply but this does not work.   

grange_test <- as.data.frame(grange_test) 
expand(grange_test, "SYMBOL")

Gives

# A tibble: 1 x 1
  `"SYMBOL"`
  <chr>     
1 SYMBOL    
ADD REPLY
0
Entering edit mode

Why are you coercing to a data frame first?

ADD REPLY
0
Entering edit mode

expand does not seem to work with Granges

expand(grange_test, "SYMBOL")
Error in UseMethod("expand_") : 
  no applicable method for 'expand_' applied to an object of class "c('GRanges', 'GenomicRanges', 'GRanges_OR_NULL', 'GRangesOrIRanges', 'Vector', 'GenomicRanges_OR_missing', 'GenomicRanges_OR_GRangesList', 'GenomicRanges_OR_GenomicRangesList', 'Annotated')"
ADD REPLY
0
Entering edit mode

I've gotten fairly close using

grange_test <- 
as.data.frame(grange_test) %>% 
  mutate(SYMBOL = strsplit(as.character(SYMBOL), ",")) %>% 
  unnest(SYMBOL)

But the resulting "SYMBOL" column has a bunch of left over characters that I'm having a hard time removing

seqnames     start       end    numBC    SYMBOL
chr3     150601398    150601565   168    c("CLRN1-AS1"
chr3     150601398    150601565   168    "CLRN1")
ADD REPLY
0
Entering edit mode

Success! Your method worked but you have to use 

S4Vectors::expand

not

Matrix::expand

or

tidyr::expand
ADD REPLY
0
Entering edit mode

Depending on the context, of course.

ADD REPLY

Login before adding your answer.

Traffic: 583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6