How to handle BSsmooth.tstat for single replicate samples? Error: need at least two non-NA values to interpolate
2
0
Entering edit mode
unmeshj • 0
@unmeshj-7420
Last seen 9.8 years ago
United States

Hello, I am working with Bsseq to find DMRs in single replicate samples (Control and Expt) from WGBS. The bsseq object creation and smoothing worked well. Although, I encounter errors in the BSmooth.tstat function.

BS.OBJ.tstat = BSmooth.tstat(BS.OBJ.keep, group1="Control", group2="Expt", estimate.var = "paired", local.correct = TRUE, verbose = TRUE)

Error: length(group1) + length(group2) >= 3 is not TRUE

OR sometimes the error is  

[BSmooth.tstat] preprocessing ... done in 2.5 sec
[BSmooth.tstat] computing stats within groups ... done in 0.4 sec
[BSmooth.tstat] computing stats across groups ... Error in approxfun(xx, yy) : 
  need at least two non-NA values to interpolate

I see that one of the errors has been mentioned in one other post here, although it does not answer the question of how to resolve the problem.

Am I using the command right? I am having hard time understanding if bsseq has options to handle single replicate data. If it can, which steps/options require changes? The documentation does not mention any special options for single replicate data and many times we prefer to add the data from replicates into single files per sample. 

Thank you very much in advance.

Best regards,

UJ

 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bsseq_0.10.0         matrixStats_0.14.0   GenomicRanges_1.14.4 XVector_0.2.0        IRanges_1.20.7       BiocGenerics_0.8.0  

loaded via a namespace (and not attached):
 [1] Biobase_2.22.0   colorspace_1.2-4 grid_3.0.2       lattice_0.20-23  locfit_1.5-9.1   munsell_0.4.2    plyr_1.8.1       Rcpp_0.11.4      scales_0.2.4    
[10] stats4_3.0.2     tools_3.0.2  

 

 

 

 

 

 

 

 

 

bsseq Bsmooth • 2.8k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 17 months ago
United States
What exactly do you mean by "single-replicate data". If you're comparing two groups and there is only a single sample in each group, you're out of luck; you cannot use the t-stat approach in BSmooth (although you can still use the smoothing functionality). Instead you could use the fisher exact test which we also have in bsseq, but which does not handle biological variation. The first error you see has to do with checking that you have enough samples in the two groups. The second error, which has been reported many times, typically happens when you include a very small chromosome (small = few CpGs), like chrMT. I suggest removing those first. Kasper On Tue, Mar 3, 2015 at 9:12 AM, unmeshj [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User unmeshj <https: support.bioconductor.org="" u="" 7420=""/> wrote Question: > How to handle BSsmooth.tstat for single replicate samples? Error: need at > least two non-NA values to interpolate > <https: support.bioconductor.org="" p="" 65352=""/>: > > Hello, I am working with Bsseq to find DMRs in single replicate samples > (Control and Expt) from WGBS. The bsseq object creation and smoothing > worked well. Although, I encounter errors in the BSmooth.tstat function. > > BS.OBJ.tstat = BSmooth.tstat(BS.OBJ.keep, group1="Control", group2="Expt", > estimate.var = "paired", local.correct = TRUE, verbose = TRUE) > > Error: length(group1) + length(group2) >= 3 is not TRUE > > *OR sometimes the error is * > > [BSmooth.tstat] preprocessing ... done in 2.5 sec > [BSmooth.tstat] computing stats within groups ... done in 0.4 sec > [BSmooth.tstat] computing stats across groups ... Error in approxfun(xx, > yy) : > need at least two non-NA values to interpolate > > I see that one of the errors has been mentioned in one other post here, > although it does not answer the question of how to resolve the problem. > > Am I using the command right? I am having hard time understanding if bsseq > has options to handle single replicate data. If it can, which steps/options > require changes? The documentation does not mention any special options for > single replicate data and many times we prefer to add the data from > replicates into single files per sample. > > Thank you very much in advance. > > Best regards, > > UJ > > > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] bsseq_0.10.0 matrixStats_0.14.0 GenomicRanges_1.14.4 > XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.22.0 colorspace_1.2-4 grid_3.0.2 lattice_0.20-23 > locfit_1.5-9.1 munsell_0.4.2 plyr_1.8.1 Rcpp_0.11.4 > scales_0.2.4 > [10] stats4_3.0.2 tools_3.0.2 > > > > > > > > > > > > > > > > > > > > ------------------------------ > > You may reply via email or visit How to handle BSsmooth.tstat for single replicate samples? Error: need at least two non-NA values to interpolate >
ADD COMMENT
0
Entering edit mode
unmeshj • 0
@unmeshj-7420
Last seen 9.8 years ago
United States

Hello Kasper, 

Thank you very much for your quick reply. 

By single-replicate-data, I mean I have only one data file for each of the 'Control'  and for 'Expt'. So, as you mention in your reply, I can not use the t-stat approach. I used the fisher test and got the results that give the pValue (unadjusted) for each of the CpG. I have two specific questions:

1) When you say the Fisher test cannot model biological variability, do you mean we cannot use any functions within bsseq to identify DMRs?

2) Can you suggest any approach/program to identify DMRs on such single replicate data (using the Fisher test results or otherwise), as identifying DMRs as opposed to individual CpGs with methylation change is big strength of Bsmooth/Bsseq. 

Also, I was able to use the Bsmooth objects for plotting and it is really useful. I see good smoothing and good representative differences in the plotted Bsmooth data at the known differential regions that I plotted. 

Thanks again,

UJ

 

 

 

 

ADD COMMENT
0
Entering edit mode

I had the same problem which I was struggling for days. This lovely error:

Error in approxfun(xx, yy) :
  need at least two non-NA values to interpolate

Moreover sometimes it worked, however there was info about this error in all rows in column with adjusted stat. Hence dmr analysis was impossible.

As the autohor said, chromosomes are too small to perform approxfun namely not enough CpG per chromosome (as I understood). So how to fix it? First take a look about your chromosomes:

chr_info <- your_bsseq_obj@rowRanges@seqnames
test <- data.frame(chr_info@values,chr_info@lenghts)

In this dataframe you'll see the length of particular chr. If lengths is 1-3 you'll recieve this approxfun error. You have to remove such chromosomes from you .bedGraphs (from bismark or methydackel) before reading the data, before this step:

your_bsseq_obj = bsseq::read.bismark(
  files = files,
  colData = data.frame(row.names = names),
  rmZeroCov = FALSE,
  strandCollapse = FALSE
)

You can do this manually, however I recommend to write some simple script in python or bash. After that when trefoil chromosomes are removed, read data again and repeat the analysis. BSmooth.tstat should work fine.

ADD REPLY

Login before adding your answer.

Traffic: 996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6