cpgCollapse in Minfi
2
0
Entering edit mode
cbartle3 • 0
@cbartle3-15165
Last seen 6.6 years ago

Running cpgCollapse on a GenomicRatioSet is seemingly removing all the feature names. 

> featureNames(gset)
   [1] "cg13869341"  "cg14008030"  "cg12045430"  "cg20826792"  "cg00381604" 
   [6] "cg20253340"  "cg21870274"  "cg03130891"  "cg24335620"  "cg16162899" 
  [11] "cg17149495"  "cg22802167"  "cg24669183"  "cg17308840"  "cg17866181" 
  [16] "cg24159721"  "cg15174812"  "cg08477687"  "cg00034556"  "cg00645010"

Running cpgCollapse:

> collapsingGset <- cpgCollapse(gset, what = c("Beta", "M"), maxGap = 500,

            blockMaxGap = 2.5 * 10^5, maxClusterWidth = 1500,

            dataSummary = colMeans, na.rm = FALSE,

            returnBlockInfo = TRUE, islandAnno = NULL, verbose = TRUE)

 

> collapsedGset <- collapsingGset[[1]]

> featureNames(collapsedGset)
NULL

Which results in numerical rownames with nondescriptive information.​

> betas <- as.data.frame(getBeta(collapsedGset))
> rownames(betas)
   [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11" 
  [12] "12"   "13"   "14"   "15"   "16"   "17"   "18"   "19"   "20"   "21"   "22"  

 

Is there a way to view the new feature names that have been assigned to the GenomicRatioSet after running cpgCollapse? Otherwise, I haven't any idea what the beta values correspond to. 

 

 

minfi • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 23 hours ago
United States

The featureNames you had previously aren't informative! Unless something like cg14008030 has some inherent meaning that I cannot infer.

Anyway, the returned GenomicRatioSet contains a GRanges object that tells you the genomic location of whatever it is that you have in that GenomicRatioSet, which is in fact informative. An example:

> library(minfi)
> library(minfiData)
> gmSet <- preprocessQuantile(MsetEx)
[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.

> granges(gmSet)
GRanges object with 485512 ranges and 0 metadata columns:
             seqnames               ranges strand
                <Rle>            <IRanges>  <Rle>
  cg13869341     chr1       [15865, 15865]      *
  cg14008030     chr1       [18827, 18827]      *
  cg12045430     chr1       [29407, 29407]      *
  cg20826792     chr1       [29425, 29425]      *
  cg00381604     chr1       [29435, 29435]      *
         ...      ...                  ...    ...
  cg17939569     chrY [27009430, 27009430]      *
  cg13365400     chrY [27210334, 27210334]      *
  cg21106100     chrY [28555536, 28555536]      *
  cg08265308     chrY [28555550, 28555550]      *
  cg14273923     chrY [28555912, 28555912]      *
  -------
  seqinfo: 24 sequences from hg19 genome; no seqlengths

So for each probe, we have the genomic location. Now let's collapse

>  z <- cpgCollapse(gmSet)
> granges(z[[1]])
GRanges object with 223497 ranges and 3 metadata columns:
           seqnames               ranges strand |        id        type
              <Rle>            <IRanges>  <Rle> | <numeric> <character>
       [1]     chr1       [15865, 15865]      * |         1     OpenSea
       [2]     chr1       [18827, 18827]      * |         2     OpenSea
       [3]     chr1       [29407, 29435]      * |         3      Island
       [4]     chr1       [68849, 68849]      * |         4     OpenSea
       [5]     chr1       [69591, 69591]      * |         5     OpenSea
       ...      ...                  ...    ... .       ...         ...
  [223493]     chrY [25314171, 25314171]      * |    223493     OpenSea
  [223494]     chrY [26716289, 26716289]      * |    223494     OpenSea

           blockgroup
            <numeric>
       [1]          1
       [2]          1
       [3]       <NA>
       [4]          1
       [5]          1
       ...        ...
  [223493]       1049
  [223494]       1050
  [223495]       1051
 
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths

And we can look to see how many CpGs go into these blocks

> table(table(width(granges(z[[1]]))))

     1      2      3      4      5      6      7      8      9     10     11
   609    143     49     44     37     49     47     44     53     43     47
    12     13     14     15     16     17     18     19     20     21     22
    47     39     52     43     38     33     47     42     35     42     34
    23     24     25     26     27     28     29     30     31     32     33
    32     20     27     35     29     27     36     29     21     22     26
    34     35     36     37     38     39     40     41     42     43     44
    10     20     18     20     17     19     13      9      9     10     13
    45     46     47     48     49     50     51     52     53     54     55
    11     13     10     11     14      4      8      6     10     14     11
    56     57     58     59     60     61     62     63     64     65     66
     9      5     11      3     11     10     13     15     10     11      7
    67     68     69     70     71     72     73     74     75     76     77
    12      7      6      9     10      4      9      6      7     11      7
    78     79     80     81     82     83     84     85     86     87     88
     5      8     17      4      8      9      9      7      3      8      3
    89     90     91     92     93     94     95     96     97     98     99
     6     11      3      4      9      8     11      4      5      6      8
   100    101    102    103    104    105    106    107    108    109    110
     5     12      5      8      9      7     10      8      9     11     10
   111    112    113    114    115    116    117    118    119    120    121
     5     11      4      8      7      7      2     10      6      3      2
   122    123    124    126    127    128    129    130    133    134    146
     3      6      2      2      2      2      1      2      1      1      2
145014
     1
>
ADD COMMENT
0
Entering edit mode
Thanks for answering this. Two points 1) The point of cpgCollapse is essentially to group CpGs into units. By design each unit may contain >1 CpG so its not really possible to use the name. 2) The name cgXXXXX is in fact informative. According to Illumina, the XXXXX is a hash of the local sequence content around the CpGs (+/- 50bp). But as far as I know, the hash function is not available. Best, Kasper On Mon, Mar 5, 2018 at 8:54 PM, James W. MacDonald [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User James W. MacDonald <https: support.bioconductor.org="" u="" 5106=""/> wrote Answer: > cpgCollapse in Minfi <https: support.bioconductor.org="" p="" 106581="" #106589="">: > > The featureNames you had previously aren't informative! Unless something > like cg14008030 has some inherent meaning that I cannot infer. > > Anyway, the returned GenomicRatioSet contains a GRanges object that tells > you the genomic location of whatever it is that you have in that > GenomicRatioSet, which is in fact informative. An example: > > > library(minfi) > > library(minfiData) > > gmSet <- preprocessQuantile(MsetEx) > [preprocessQuantile] Mapping to genome. > [preprocessQuantile] Fixing outliers. > [preprocessQuantile] Quantile normalizing. > > > granges(gmSet) > GRanges object with 485512 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > cg13869341 chr1 [15865, 15865] * > cg14008030 chr1 [18827, 18827] * > cg12045430 chr1 [29407, 29407] * > cg20826792 chr1 [29425, 29425] * > cg00381604 chr1 [29435, 29435] * > ... ... ... ... > cg17939569 chrY [27009430, 27009430] * > cg13365400 chrY [27210334, 27210334] * > cg21106100 chrY [28555536, 28555536] * > cg08265308 chrY [28555550, 28555550] * > cg14273923 chrY [28555912, 28555912] * > ------- > seqinfo: 24 sequences from hg19 genome; no seqlengths > > So for each probe, we have the genomic location. Now let's collapse > > > z <- cpgCollapse(gmSet) > > granges(z[[1]]) > GRanges object with 223497 ranges and 3 metadata columns: > seqnames ranges strand | id type > <rle> <iranges> <rle> | <numeric> <character> > [1] chr1 [15865, 15865] * | 1 OpenSea > [2] chr1 [18827, 18827] * | 2 OpenSea > [3] chr1 [29407, 29435] * | 3 Island > [4] chr1 [68849, 68849] * | 4 OpenSea > [5] chr1 [69591, 69591] * | 5 OpenSea > ... ... ... ... . ... ... > [223493] chrY [25314171, 25314171] * | 223493 OpenSea > [223494] chrY [26716289, 26716289] * | 223494 OpenSea > > blockgroup > <numeric> > [1] 1 > [2] 1 > [3] <na> > [4] 1 > [5] 1 > ... ... > [223493] 1049 > [223494] 1050 > [223495] 1051 > > ------- > seqinfo: 24 sequences from an unspecified genome; no seqlengths > > And we can look to see how many CpGs go into these blocks > > > table(table(width(granges(z[[1]])))) > > 1 2 3 4 5 6 7 8 9 10 11 > 609 143 49 44 37 49 47 44 53 43 47 > 12 13 14 15 16 17 18 19 20 21 22 > 47 39 52 43 38 33 47 42 35 42 34 > 23 24 25 26 27 28 29 30 31 32 33 > 32 20 27 35 29 27 36 29 21 22 26 > 34 35 36 37 38 39 40 41 42 43 44 > 10 20 18 20 17 19 13 9 9 10 13 > 45 46 47 48 49 50 51 52 53 54 55 > 11 13 10 11 14 4 8 6 10 14 11 > 56 57 58 59 60 61 62 63 64 65 66 > 9 5 11 3 11 10 13 15 10 11 7 > 67 68 69 70 71 72 73 74 75 76 77 > 12 7 6 9 10 4 9 6 7 11 7 > 78 79 80 81 82 83 84 85 86 87 88 > 5 8 17 4 8 9 9 7 3 8 3 > 89 90 91 92 93 94 95 96 97 98 99 > 6 11 3 4 9 8 11 4 5 6 8 > 100 101 102 103 104 105 106 107 108 109 110 > 5 12 5 8 9 7 10 8 9 11 10 > 111 112 113 114 115 116 117 118 119 120 121 > 5 11 4 8 7 7 2 10 6 3 2 > 122 123 124 126 127 128 129 130 133 134 146 > 3 6 2 2 2 2 1 2 1 1 2 > 145014 > 1 > > > > ------------------------------ > > Post tags: minfi > > You may reply via email or visit https://support.bioconductor. > org/p/106581/#106589 >
ADD REPLY
0
Entering edit mode
@hector-corrada-bravo-6203
Last seen 5.4 years ago
United States
When argument ‘returnBlockInfo=TRUE’ that information is returned as well. In your example, it is included in `collapsingGset[[2]]` On Mar 5, 2018, 12:21 PM -0500, cbartle3 [bioc] <noreply@bioconductor.org>, wrote: > Activity on a post you are following on support.bioconductor.org > User cbartle3 wrote Question: cpgCollapse in Minfi: > Running cpgCollapse on a GenomicRatioSet is seemingly removing all the feature names. > > featureNames(gset) > [1] "cg13869341" "cg14008030" "cg12045430" "cg20826792" "cg00381604" > [6] "cg20253340" "cg21870274" "cg03130891" "cg24335620" "cg16162899" > [11] "cg17149495" "cg22802167" "cg24669183" "cg17308840" "cg17866181" > [16] "cg24159721" "cg15174812" "cg08477687" "cg00034556" "cg00645010" > Running cpgCollapse: > > collapsingGset <- cpgCollapse(gset, what = c("Beta", "M"), maxGap = 500, > > blockMaxGap = 2.5 * 10^5, maxClusterWidth = 1500, > > dataSummary = colMeans, na.rm = FALSE, > > returnBlockInfo = TRUE, islandAnno = NULL, verbose = TRUE) > > > collapsedGset <- collapsingGset[[1]] > > featureNames(collapsedGset) > NULL > > Which results in numerical rownames with nondescriptive information. > > betas <- as.data.frame(getBeta(collapsedGset)) > > rownames(betas) > [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" > [12] "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" > > Is there a way to view the new feature names that have been assigned to the GenomicRatioSet after running cpgCollapse? Otherwise, I haven't any idea what the beta values correspond to. > > > Post tags: minfi > You may reply via email or visit cpgCollapse in Minfi
ADD COMMENT

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6