Subsetting GRanges based on metadata
1
0
Entering edit mode
mdeea123 ▴ 10
@mdeea123-11297
Last seen 8.4 years ago

I'm trying to subset a GRanges object to extract only the 1000 genes I'm interested in

I’ve just extracted the transcription start sites for mm10

head(tssgr)

GRanges object with 6 ranges and 1 metadata column:

            seqnames                 ranges strand |       GENEID

               <Rle>              <IRanges>  <Rle> | <FactorList>

  100009600     chr9 [ 21075496,  21075496]      - |    100009600

  100009609     chr7 [ 84964009,  84964009]      - |    100009609

  100009614    chr10 [ 77711446,  77711446]      + |    100009614

  100009664    chr11 [ 45808083,  45808083]      + |    100009664

     100012     chr4 [144162651, 144162651]      - |       100012

     100017     chr4 [134768004, 134768004]      - |       100017

Here is my gene list

> head(geneTable)
  SYMBOL GENEID
1   Aspn  66695
2 Angpt1  11600
3  Gm773 331416
4   Lifr  16880
5 Il1rl1  17082
6    Ogn  18295

I've tried

subt <- tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]

but I get this error

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedLogicalList"’

 

What am I doing wrong??

Here is the traceback:

8: stop(gettextf("unable to find an inherited method for function %s for signature %s", 
       sQuote(fdef@generic), sQuote(cnames)), domain = NA)
7: (function (classes, fdef, mtable) 
   {
       methods <- .findInheritedMethods(classes, fdef, mtable)
       if (length(methods) == 1L) 
           return(methods[[1L]])
       else if (length(methods) == 0L) {
           cnames <- paste0("\"", vapply(classes, as.character, 
               ""), "\"", collapse = ", ")
           stop(gettextf("unable to find an inherited method for function %s for signature %s", 
               sQuote(fdef@generic), sQuote(cnames)), domain = NA)
       }
       else stop("Internal error in finding inherited methods; didn't return a unique method", 
           domain = NA)
   })(list("CompressedLogicalList"), function (i, x, exact = TRUE, 
       upperBoundIsStrict = TRUE) 
   standardGeneric("NSBS"), <environment>)
6: NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append)
5: normalizeSingleBracketSubscript(i, x)
4: extractROWS(x, i)
3: extractROWS(x, i)
2: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]
1: tssgr[mcols(tssgr)$GENEID %in% geneTable$GENEID]

Mitchell

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Mus.musculus_1.3.1                       TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2 org.Mm.eg.db_3.3.0                      
 [4] GO.db_3.3.0                              OrganismDbi_1.14.1                       GenomicFeatures_1.24.5                  
 [7] AnnotationDbi_1.34.4                     BSgenome.Mmusculus.UCSC.mm10_1.4.0       BSgenome_1.40.1                         
[10] rtracklayer_1.32.2                       Biostrings_2.40.2                        XVector_0.12.1                          
[13] GenomicRanges_1.24.2                     GenomeInfoDb_1.8.3                       IRanges_2.6.1                           
[16] S4Vectors_0.10.2                         BiocInstaller_1.22.3                     Biobase_2.32.0                          
[19] BiocGenerics_0.18.0                     

loaded via a namespace (and not attached):
 [1] graph_1.50.0               zlibbioc_1.18.0            GenomicAlignments_1.8.4    BiocParallel_1.6.5        
 [5] tools_3.3.0                SummarizedExperiment_1.2.3 DBI_0.5                    RBGL_1.48.1               
 [9] bitops_1.0-6               biomaRt_2.28.0             RCurl_1.95-4.8             RSQLite_1.0.0             
[13] Rsamtools_1.24.0           XML_3.98-1.4              
GRanges metadata subsetting • 2.3k views
ADD COMMENT
1
Entering edit mode
@michael-lawrence-3846
Last seen 3.1 years ago
United States

This is because there could in principle be multiple genes for a given transcript, so as you can see, you have a FactorList instead of a factor or ordinary vector for your gene IDs. You could attempt to drop the FactorList to a factor/vector, assuming there are no one-to-many relationships.

tssgr$GENEID <- drop(tssgr$GENEID)

Alternatively, you could select a TSS if any of its genes match:

subt <- tssgr[any(mcols(tssgr)$GENEID %in% geneTable$GENEID)]

Note that there are annotation sources that will give you the gene symbols without any extra work, e.g.:

tss <- resize(transcripts(Homo.sapiens, columns="SYMBOL"), 1L)

 

ADD COMMENT
0
Entering edit mode

Thanks heaps. 

Mitchell

ADD REPLY

Login before adding your answer.

Traffic: 745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6