Subsetting GRanges based on metadata
mdeea123 ▴ 10
Last seen 8.1 years ago

I'm trying to subset a GRanges object to extract only the 1000 genes I'm interested in

I’ve just extracted the transcription start sites for mm10


GRanges object with 6 ranges and 1 metadata column:

            seqnames                 ranges strand |       GENEID

               <Rle>              <IRanges>  <Rle> | <FactorList>

  100009600     chr9 [ 21075496,  21075496]      - |    100009600

  100009609     chr7 [ 84964009,  84964009]      - |    100009609

  100009614    chr10 [ 77711446,  77711446]      + |    100009614

  100009664    chr11 [ 45808083,  45808083]      + |    100009664

     100012     chr4 [144162651, 144162651]      - |       100012

     100017     chr4 [134768004, 134768004]      - |       100017

Here is my gene list

> head(geneTable)
1   Aspn  66695
2 Angpt1  11600
3  Gm773 331416
4   Lifr  16880
5 Il1rl1  17082
6    Ogn  18295

I've tried

subt <- tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]

but I get this error

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedLogicalList"’


What am I doing wrong??

Here is the traceback:

8: stop(gettextf("unable to find an inherited method for function %s for signature %s", 
       sQuote(fdef@generic), sQuote(cnames)), domain = NA)
7: (function (classes, fdef, mtable) 
       methods <- .findInheritedMethods(classes, fdef, mtable)
       if (length(methods) == 1L) 
       else if (length(methods) == 0L) {
           cnames <- paste0("\"", vapply(classes, as.character, 
               ""), "\"", collapse = ", ")
           stop(gettextf("unable to find an inherited method for function %s for signature %s", 
               sQuote(fdef@generic), sQuote(cnames)), domain = NA)
       else stop("Internal error in finding inherited methods; didn't return a unique method", 
           domain = NA)
   })(list("CompressedLogicalList"), function (i, x, exact = TRUE, 
       upperBoundIsStrict = TRUE) 
   standardGeneric("NSBS"), <environment>)
6: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
5: normalizeSingleBracketSubscript(i, x)
4: extractROWS(x, i)
3: extractROWS(x, i)
2: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]
1: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]


R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Mus.musculus_1.3.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2                      
 [4] GO.db_3.3.0                              OrganismDbi_1.14.1                       GenomicFeatures_1.24.5                  
 [7] AnnotationDbi_1.34.4                     BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.40.1                         
[10] rtracklayer_1.32.2                       Biostrings_2.40.2                        XVector_0.12.1                          
[13] GenomicRanges_1.24.2                     GenomeInfoDb_1.8.3                       IRanges_2.6.1                           
[16] S4Vectors_0.10.2                         BiocInstaller_1.22.3                     Biobase_2.32.0                          
[19] BiocGenerics_0.18.0                     

loaded via a namespace (and not attached):
 [1] graph_1.50.0               zlibbioc_1.18.0            GenomicAlignments_1.8.4    BiocParallel_1.6.5        
 [5] tools_3.3.0                SummarizedExperiment_1.2.3 DBI_0.5                    RBGL_1.48.1               
 [9] bitops_1.0-6               biomaRt_2.28.0             RCurl_1.95-4.8             RSQLite_1.0.0             
[13] Rsamtools_1.24.0           XML_3.98-1.4              
GRanges metadata subsetting
Last seen 2.8 years ago
United States

This is because there could in principle be multiple genes for a given transcript, so as you can see, you have a FactorList instead of a factor or ordinary vector for your gene IDs. You could attempt to drop the FactorList to a factor/vector, assuming there are no one-to-many relationships.

tssgr$GENEID <- drop(tssgr$GENEID)

Alternatively, you could select a TSS if any of its genes match:

subt <- tssgr[any(mcols(tssgr)$GENEID %in% geneTable$GENEID)]

Note that there are annotation sources that will give you the gene symbols without any extra work, e.g.:

tss <- resize(transcripts(Homo.sapiens, columns="SYMBOL"), 1L)


Thanks heaps. 



