Documentation for FeatureDB class?
1
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.3 years ago
Zentrum für Molekularbiologie, Universi…

Hi,

I am currently starting to work, for the first time, with 450k methylation array data. I started by loading the IlluminaHumanMethylation450k.db package, but then noted that it has been deprecated in favour of the new package FDb.InfiniumMethylation.hg19, which seems to contain, instead of the usual AnnotationDb objects, an object of class FeatureDB, defined in the GenomicFeatures package.

Do we have some documentation on this new class? The help pages for FeatureDb and features are rather thin, and the package vignette only explains the TranscriptDB class but does not mention FeatureDB.

What I want to do is simple: I have a vector of Illumina CpG identifiers (like "cg20253340") and would like to get a GRanges object with their location.

I managed to do this with the old IlluminaHumanMethylation450k.db package but as it is deprecated and I am only about to start my project, I should probably use the new FDb.InfiniumMethylation.hg19 package. Should I, or is this all still too new for production use?

Cheers

   Simon

FeatureDB GenomicFeatures • 1.2k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States

Hi Simon,

It's pretty simple:

> z <- get450k()
Fetching coordinates for hg19...
> z
GRanges with 485577 ranges and 10 metadata columns:
             seqnames               ranges strand   |    addressA    addressB
                <Rle>            <IRanges>  <Rle>   | <character> <character>
  cg04913815    chr16       [60438, 60439]      *   |    24771476            
  cg01686861    chr16       [60748, 60749]      *   |    36644319    45624454
  cg05558259    chr16       [61085, 61086]      *   |    65765435            
  cg26978960    chr16       [62460, 62461]      *   |    28717484            
  cg03792876    chr16       [73243, 73244]      *   |    42725455            
         ...      ...                  ...    ... ...         ...         ...
  cg17939569     chrY [27009430, 27009431]      *   |    73757458            
  cg13365400     chrY [27210334, 27210335]      *   |    61745505            
  cg21106100     chrY [28555536, 28555537]      *   |    56793430            
  cg08265308     chrY [28555550, 28555551]      *   |    67794346    26610401
  cg14273923     chrY [28555912, 28555913]      *   |    16749405            
             channel platform percentGC
               <Rle>    <Rle> <numeric>
  cg04913815    Both    HM450      0.58
  cg01686861     Red    HM450      0.76
  cg05558259    Both    HM450      0.56
  cg26978960    Both    HM450      0.66
  cg03792876    Both    HM450      0.64
         ...     ...      ...       ...
  cg17939569    Both    HM450      0.42
  cg13365400    Both    HM450      0.44
  cg21106100    Both    HM450      0.66
  cg08265308     Red    HM450      0.68
  cg14273923    Both    HM450      0.48
                                                      sourceSeq probeType
                                                 <DNAStringSet>     <Rle>
  cg04913815 TTTCGGTGGTACTGCGAAGGCAGAGCAGAGTTCTGCTCAGGTCAGACCCG        cg
  cg01686861 CGCCCCCAGGCCGGCGCCGTGCGACTTTGCTCCTGCAACACACGCCCCCC        cg
  cg05558259 CAGCTAGGGACATTGCAGGCTCCTCTTGCTCAAAGTGTAGTGGCAGCACG        cg
  cg26978960 CGGCCCAGTAGAGCCCTAGGGGTGACGCCACTCCCACTCACTGTCGACTC        cg
  cg03792876 ATGGAGGCTTGGGCGGGTCACCCCCAGTGCAGGCCAAGATGCAGGTTACG        cg
         ...                                                ...       ...
  cg17939569 CGCCTAAATAAGAATAGGAGTAAAGGAGAGTATTACCTCCAAATCACCGG        cg
  cg13365400 CGTCACCTGGATGCTGGTTTAAGTGATATATGAAAATCCACCCTAAGGAC        cg
  cg21106100 CGGATCTTTCTGACCAGCCCCGGCCCCATCTTGGCCTTACCTGGCCTCCC        cg
  cg08265308 CGGCTCCCAACGCTCGGATCTTTCTGACCAGCCCCGGCCCCATCTTGGCC        cg
  cg14273923 TGGTATTGGTGAAGTCTACCACTCCAGCTCGTAGACTTCCATAATCGTCG        cg
              probeStart    probeEnd probeTarget
             <character> <character>   <numeric>
  cg04913815       60438       60487       60438
  cg01686861       60700       60749       60748
  cg05558259       61037       61086       61085
  cg26978960       62412       62461       62460
  cg03792876       73195       73244       73243
         ...         ...         ...         ...
  cg17939569    27009430    27009479    27009430
  cg13365400    27210334    27210383    27210334
  cg21106100    28555488    28555537    28555536
  cg08265308    28555502    28555551    28555550
  cg14273923    28555912    28555961    28555912
  ---
  seqlengths:
        chr1      chr2      chr3      chr4 ...     chr22      chrX      chrY
   249250621 243199373 198022430 191154276 ...  51304566 155270560  59373566

I think you know what to do from there.

Best,

Jim

 

ADD COMMENT
0
Entering edit mode

I should note that get450k() comes after library(FDb.InfiniumMethylation.hg19).

Best,

Jim

ADD REPLY
0
Entering edit mode

Okay, that is simple.

Is this the general idea of featureDB objects that they have one simple function that produces a table? I guess there is more to them. Although I don't want to complain: the table is, in a way, all one might ever need.

Thanks.

  Simon

ADD REPLY
0
Entering edit mode

The idea for these particular DB packages is to encapsulate the two different Illumina Methylation platforms in a single package, and allow you to extract data for whatever platform(s) you care about.

> suppressMessages(library(FDb.InfiniumMethylation.hg19))
> ls(2)
[1] "FDb.InfiniumMethylation.hg19" "get27k"                      
[3] "get450k"                      "getPlatform"        

So you can either extract everything using getPlatform(), or subsets thereof using the other two functions.

The FeatureDb class is just a generic container for storing genomic locations of arbitrary genomic features, in a SQLite data base. For example a TranscriptDb is a subclass of FeatureDb for storing transcript information.

So there is no idea that there would be a simple function per se. It all depends on what you put in it, and how you envision people using it. In this situation, three functions pretty much covers all the relevant use cases.

Best,

Jim

 

 
ADD REPLY

Login before adding your answer.

Traffic: 825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6