Hi there,
I am currently dealing with some pilot study data from a 16S rRNA survey from sediments. The data consists of 24 sites (no replication) of MiSeq 16S amplicon data processed through Mothur and clustered into ~227,216 OTUs. Based on the paper by McMurdie and Holmes (2014) on the inadmissability of rarefying data, it was decided to run the raw OTU counts through DESeq2 to normalize the data.
However, running rlogTransformation on the DESeqDataSet (dds) results in a warning message that the glm.fit algorithm did not converge. The same is also true if using the varianceStabiliziingTransformation function (although in our case rlog is more appropriate). Both methods are being run with blind=TRUE as is appropriate at this stage. My question is: What is the effect on normalization if the glm.fit does not converge and what are the likely causes of non-convergence?
Many thanks,
Ben
---------------------------------------- Commands ----------------------------------------
> library('DESeq2')
> #Load the data
> count_data<-read.delim('./data/allOTU_counts.reformat.txt', header=T, row.names=1)
>
>
> #Set up the experimental design matrix
> site_data<-as.data.frame(as.character(1:ncol(count_data)), row.names=colnames(count_data))
> colnames(site_data)<-'site_type'
>
> #Perform Deseq2 magic
> dds<-DESeqDataSetFromMatrix(countData=count_data, colData=site_data, design= ~site_type)
Usage note: the following factors have 3 or more levels:
site_type
For DESeq2 versions < 1.3, if you plan on extracting results for
these factors, we recommend using betaPrior=FALSE as an argument
when calling DESeq().
As currently implemented in version 1.2, the log2 fold changes can
vary if the base level is changed, when extracting results for a
factor with 3 or more levels. A solution will be implemented in
version 1.3 which allows for the use of a beta prior and symmetric
log2 fold change estimates regardless of the selection of base level.
> dds$site_type<-factor(dds$site_type, levels=as.character(1:ncol(count_data)))
> #Look at the normalized values. Because the size factors of the datasets are considerable, we will use
> # a regularized log transformation instead of a variance stabilizing transformation.
> rld<-rlogTransformation(dds)
Warning message:
glm.fit: algorithm did not converge
---------------------------------------- Session Info ----------------------------------------
R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.2.10 RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3 GenomicRanges_1.14.4 XVector_0.2.0
[6] IRanges_1.20.7 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] annotate_1.40.1 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.3.1 genefilter_1.44.0 grid_3.0.2
[7] lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
[13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-4