Question

Problem with mcols in GenomicFeatures

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 2.5 years ago

United States

Hi,

I created a transcriptDb using the makeTranscriptDbFromGFF() command. I can extract all of the transcripts by gene name. I get a GRanges list with each gene and then 2 columns of metadata including the transcripts ID and transcript names associated with each gene. However, when I try to pull this information out with mcols, it is empty. I've included the code and output below. Am I doing something wrong?

Thanks

> transcript <- transcriptsBy(gencodeTxdb,by='gene')
> test <- transcript[2:3]
> test
GRangesList object of length 2:
$ENSMUSG00000000003.10 
GRanges object with 3 ranges and 2 metadata columns:
      seqnames               ranges strand |     tx_id               tx_name
         <Rle>            <IRanges>  <Rle> | <integer>           <character>
  [1]     chrX [77837901, 77853623]      - |     11687 ENSMUSG00000000003.10
  [2]     chrX [77837901, 77853623]      - |     11688  ENSMUST00000000003.8
  [3]     chrX [77837902, 77853530]      - |     11689  ENSMUST00000114041.2

$ENSMUSG00000000028.9 
GRanges object with 4 ranges and 2 metadata columns:
      seqnames               ranges strand | tx_id              tx_name
  [1]    chr16 [18780447, 18811972]      - | 16295 ENSMUST00000000028.8
  [2]    chr16 [18780447, 18811987]      - | 16296 ENSMUSG00000000028.9
  [3]    chr16 [18780453, 18811626]      - | 16297 ENSMUST00000096990.4
  [4]    chr16 [18807356, 18811987]      - | 16298 ENSMUST00000115585.1

-------
seqinfo: 22 sequences (1 circular) from an unspecified genome; no seqlengths
> mcols(test)
DataFrame with 2 rows and 0 columns

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicAlignments_1.2.1 Rsamtools_1.18.2        Biostrings_2.34.1       XVector_0.6.0           BiocInstaller_1.16.1   
 [6] GenomicFeatures_1.18.3  AnnotationDbi_1.28.1    Biobase_2.26.0          GenomicRanges_1.18.4    GenomeInfoDb_1.2.4     
[11] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2    BatchJobs_1.5      BBmisc_1.9         BiocParallel_1.0.3 biomaRt_2.22.0     bitops_1.0-6       brew_1.0-6        
 [8] checkmate_1.5.1    codetools_0.2-10   DBI_0.3.1          digest_0.6.8       fail_1.2           foreach_1.4.2      iterators_1.0.7   
[15] RCurl_1.95-4.5     RSQLite_1.0.0      rtracklayer_1.26.2 sendmailR_1.2-1    stringr_0.6.2      tools_3.1.2        XML_3.98-1.1      
[22] zlibbioc_1.12.0

genomicfeatures • 1.2k views

ADD COMMENT • link 9.9 years ago Jake ▴ 90

score 0 · Answer 1 · 2015-02-26

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

The GRangesList is a list of GRanges, so you have to act accordingly:

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> tx <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene)
> mcols(tx[1])
DataFrame with 1 row and 0 columns

> mcols(tx[[1]])
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
  <integer> <character>
1     70455  uc002qsd.4
2     70456  uc002qsf.2

> lapply(tx[1:3], mcols)
$`1`
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
  <integer> <character>
1     70455  uc002qsd.4
2     70456  uc002qsf.2

$`10`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
  <integer> <character>
1     31944  uc003wyw.1

$`100`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
  <integer> <character>
1     72132  uc002xmj.3

ADD COMMENT • link 9.9 years ago James W. MacDonald 67k

0

Entering edit mode

It's cheaper to unlist the GRangesList and then pull out the metadata 'all at once'

mcols(unlist(tx))

The unlisted DataFrame is a good starting point for many operations, e.g., adding additional columns. One can re-list the DataFrame flesh around the tx skeleton to recover the overall 'geometry'

> relist(mcols(unlist(tx)), tx)
SplitDataFrameList of length 23459
$`1`
DataFrame with 2 rows and 2 columns
      tx_id     tx_name
   
1     70455  uc002qsd.4
2     70456  uc002qsf.2

$`10`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
   
1     31944  uc003wyw.1

$`100`
DataFrame with 1 row and 2 columns
      tx_id     tx_name
   
1     72132  uc002xmj.3

...
<23456 more elements>

ADD REPLY • link 9.9 years ago Martin Morgan 25k

score 0 · Answer 2 · 2015-02-26

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 2.5 years ago

United States

Awesome thanks

ADD COMMENT • link 9.9 years ago Jake ▴ 90