normalizecoverage in methylkit outputs NAs
1
0
Entering edit mode
GFM ▴ 20
@gfm-8326
Last seen 2.9 years ago
European Union

Hello,
I am using Methylkit for the analysis of bisulfite experiment in amplicons.
I am trying to use the normalizeCoverage function, but it transforms the data to NAs.
Here is the code:

myobj=methRead(file.list,
                   sample.id=as.list(paste(as.character(A_B_samples$Sample.2))),
                   assembly="Amplicons",
                   treatment=treated_vec,
                   context="CpG"
  )
 
  #filter low coverage
  filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)
  filtered.myobj.norm <- normalizeCoverage(filtered.myobj)

After the normalizeCoverage step the data is transformed to NAs.

If that would have worked, the next step would be:
meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)
myDiff=calculateDiffMeth(meth)

Without the normalizeCoverage step the differential methylation works OK.

Can you please help with this?

Do you recommend normalizing the covergae for data from amplicons?

Thanks

---------------------------------------
 sessionInfo()
R version 3.5.0 Patched (2018-06-04 r74846)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
  [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
  [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] bindrcpp_0.2.2       methylKit_1.6.1      tidyr_0.8.1          tibble_1.4.2         dplyr_0.7.6          GenomicRanges_1.32.6
[7] GenomeInfoDb_1.16.0  IRanges_2.14.11      S4Vectors_0.18.3     BiocGenerics_0.26.0

loaded via a namespace (and not attached):
  [1] Biobase_2.40.0              splines_3.5.0               R.utils_2.7.0               gtools_3.8.1               
[5] assertthat_0.2.0            GenomeInfoDbData_1.1.0      Rsamtools_1.32.3            numDeriv_2016.8-1          
[9] pillar_1.3.0                lattice_0.20-35             glue_1.3.0                  limma_3.36.3               
[13] bbmle_1.0.20                XVector_0.20.0              qvalue_2.12.0               colorspace_1.3-2           
[17] Matrix_1.2-14               R.oo_1.22.0                 plyr_1.8.4                  XML_3.98-1.16              
[21] pkgconfig_2.0.2             emdbook_1.3.10              zlibbioc_1.26.0             purrr_0.2.5                
[25] scales_1.0.0                BiocParallel_1.14.2         ggplot2_3.0.0               SummarizedExperiment_1.10.1
[29] lazyeval_0.2.1              magrittr_1.5                crayon_1.3.4                mclust_5.4.1               
[33] R.methodsS3_1.7.1           MASS_7.3-50                 tools_3.5.0                 data.table_1.11.4          
[37] matrixStats_0.54.0          stringr_1.3.1               munsell_0.5.0               DelayedArray_0.6.5         
[41] Biostrings_2.48.0           compiler_3.5.0              fastseg_1.26.0              rlang_0.2.2                
[45] grid_3.5.0                  RCurl_1.95-4.11             bitops_1.0-6                gtable_0.2.0               
[49] reshape2_1.4.3              R6_2.2.2                    GenomicAlignments_1.16.0    rtracklayer_1.40.6         
[53] bindr_0.1.1                 stringi_1.1.7               Rcpp_0.12.18                tidyselect_0.2.4           
[57] coda_0.19-1                
 

methylkit normalizecoverage NA • 1.4k views
ADD COMMENT
0
Entering edit mode
@altuna-akalin-7813
Last seen 5.9 years ago
Germany/Berlin/MDC-BIMSB
Normalizing coverage shouldn’t lead to NAs. The thing that might lead to NAs is using ,min.per.group=3L In the unite function. Why this happens is explained in the docs, probably in the vignette and also in the discussion forum https://groups.google.com/forum/m/#!forum/methylkit_discussion If this doesn’t help, please send a reproducible example On Thu 25. Oct 2018 at 11:30, GFM [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User GFM <https: support.bioconductor.org="" u="" 8326=""/> wrote Question: > normalizecoverage in methylkit outputs NAs > <https: support.bioconductor.org="" p="" 114444=""/>: > > Hello, > I am using Methylkit for the analysis of bisulfite experiment in amplicons. > I am trying to use the normalizeCoverage function, but it transforms the > data to NAs. > Here is the code: > > myobj=methRead(file.list, > sample.id > =as.list(paste(as.character(A_B_samples$Sample.2))), > assembly="Amplicons", > treatment=treated_vec, > context="CpG" > ) > > #filter low coverage > filtered.myobj <- > filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL) > filtered.myobj.norm <- normalizeCoverage(filtered.myobj) > > After the normalizeCoverage step the data is transformed to NAs. > > If that would have worked, the next step would be: > meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L) > myDiff=calculateDiffMeth(meth) > > Without the normalizeCoverage step the differential methylation works OK. > > Can you please help with this? > > Do you recommend normalizing the covergae for data from amplicons? > > Thanks > > --------------------------------------- > sessionInfo() > R version 3.5.0 Patched (2018-06-04 r74846) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows >= 8 x64 (build 9200) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils > datasets methods base > > other attached packages: > [1] bindrcpp_0.2.2 methylKit_1.6.1 tidyr_0.8.1 > tibble_1.4.2 dplyr_0.7.6 GenomicRanges_1.32.6 > [7] GenomeInfoDb_1.16.0 IRanges_2.14.11 S4Vectors_0.18.3 > BiocGenerics_0.26.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.40.0 splines_3.5.0 > R.utils_2.7.0 gtools_3.8.1 > [5] assertthat_0.2.0 GenomeInfoDbData_1.1.0 > Rsamtools_1.32.3 numDeriv_2016.8-1 > [9] pillar_1.3.0 lattice_0.20-35 > glue_1.3.0 limma_3.36.3 > [13] bbmle_1.0.20 XVector_0.20.0 > qvalue_2.12.0 colorspace_1.3-2 > [17] Matrix_1.2-14 R.oo_1.22.0 > plyr_1.8.4 XML_3.98-1.16 > [21] pkgconfig_2.0.2 emdbook_1.3.10 > zlibbioc_1.26.0 purrr_0.2.5 > [25] scales_1.0.0 BiocParallel_1.14.2 > ggplot2_3.0.0 SummarizedExperiment_1.10.1 > [29] lazyeval_0.2.1 magrittr_1.5 > crayon_1.3.4 mclust_5.4.1 > [33] R.methodsS3_1.7.1 MASS_7.3-50 > tools_3.5.0 data.table_1.11.4 > [37] matrixStats_0.54.0 stringr_1.3.1 > munsell_0.5.0 DelayedArray_0.6.5 > [41] Biostrings_2.48.0 compiler_3.5.0 > fastseg_1.26.0 rlang_0.2.2 > [45] grid_3.5.0 RCurl_1.95-4.11 > bitops_1.0-6 gtable_0.2.0 > [49] reshape2_1.4.3 R6_2.2.2 > GenomicAlignments_1.16.0 rtracklayer_1.40.6 > [53] bindr_0.1.1 stringi_1.1.7 > Rcpp_0.12.18 tidyselect_0.2.4 > [57] coda_0.19-1 > > > ------------------------------ > > Post tags: methylkit, normalizecoverage, NA > > You may reply via email or visit > normalizecoverage in methylkit outputs NAs > -- Sent from mobile, excuse the brevity
ADD COMMENT
0
Entering edit mode

Thank you very much for your answer.

I now understand the reason for the problem: since this is short amplicon, and the coverage is quite even in all the CpGs in the amplicon, there are several samples that have no data at all CpGs after the filterByCoverage:

 filtered.myobj <- filterByCoverage(myobj,lo.count=50,lo.perc=NULL,hi.perc=NULL)

I think that at that stage the list should be filtered from such samples that do not have data at all.

Is it OK, just to delete these cases manually and update the treatment of the list, or is there a methylkit function that does this?

Then the next steps will be:

normalize the coverage:

filtered.myobj.norm <- normalizeCoverage(filtered.myobj)
 
  merge all samples to one object for base-pair locations that are covered in all samples:
   
    meth=methylKit::unite(filtered.myobj.norm, destrand=FALSE,min.per.group=3L)

and differential methylation:
 
 myDiff=calculateDiffMeth(meth,weighted.mean = T, adjust = "hochberg")

Is this workflow OK?

Do you recommend normalizing the coverage when working with short amplicons? Ideally, I would like to give higher weight for methylation, when the evidence is based on higher coverage.

Thanks for all the help.

ADD REPLY

Login before adding your answer.

Traffic: 832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6