Deseq2 sfType
1
0
Entering edit mode
Caroline • 0
@864f136c
Last seen 3 months ago
United States

I frequently use Deseq2 to analyze RNAseq microbiome data. An error I commonly get is

dds <-  DESeqDataSetFromMatrix(countData = desd, colData = design, design = ~condition)
dds < DESeq(dds)
estimating size factors 
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :
  every gene contains at least one zero, cannot compute log geometric means

The only fix I have been able to find is a adding a pseudo count. However, I just came across the sfType argument. The default is 'ratio', but when specified as 'poscounts' this also seems to fix the issue.

I'd like to understand this argument better and whether this is appropriate given my use case.

Microbiome DESeq2 RNASeq • 396 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

This is a bit tricky. If you look at the help page for DESeq, it says

  sfType: either "ratio", "poscounts", or "iterate" for the type of
          size factor estimation. See 'estimateSizeFactors' for
          description.

And then if you do ?estimateSizeFactors you get this generic help page that is not useful at all. What you really need is the method help page, specifically for DESeqDataSet objects. There's a way to get that using the ? operator, but I can't get it to work right now. You can always fall back on the help function though. Using help("estimateSizeFactors,DESeqDataSet-method", "DESeq2") (yes, I know, how would you ever figure out that you need that? By looking at the help pdf file, in particular the index page), you will get the correct help page, and it says this:


Usage:

     ## S4 method for signature 'DESeqDataSet'
     estimateSizeFactors(
       object,
       type = c("ratio", "poscounts", "iterate"),
       locfunc = stats::median,
       geoMeans,
       controlGenes,
       normMatrix,
       quiet = FALSE
     )

Arguments:

  object: a DESeqDataSet

    type: Method for estimation: either "ratio", "poscounts", or
          "iterate". "ratio" uses the standard median ratio method
          introduced in DESeq. The size factor is the median ratio of
          the sample over a "pseudosample": for each gene, the
          geometric mean of all samples. "poscounts" and "iterate"
          offer alternative estimators, which can be used even when all
          genes contain a sample with a zero (a problem for the default
          method, as the geometric mean becomes zero, and the ratio
          undefined). The "poscounts" estimator deals with a gene with
          some zeros, by calculating a modified geometric mean by
          taking the n-th root of the product of the non-zero counts.
          This evolved out of use cases with Paul McMurdie's phyloseq
          package for metagenomic samples. The "iterate" estimator
          iterates between estimating the dispersion with a design of
          ~1, and finding a size factor vector by numerically
          optimizing the likelihood of the ~1 model.

Which I believe should answer your question.

Login before adding your answer.

Traffic: 835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6