What does this message mean ?
" the design formula contains one or more numeric variables that have mean or standard deviation larger than 5 (an arbitrary threshold to trigger this message). it is generally a good idea to center and scale numeric variables in the design to improve GLM convergence.")
What if the mean or the standard deviation is higher than 5? Why would you have to scale and center your numeric variables? I am including age and BMI in my design as my continuous variables (sex and group are categorical). I also cut the continuous variables into small bins as it is recommended under the FAQ "How can I include a continuous covariate in the design formula?". My design = ~InsulinResistance + sex + bmi + age and I want to perform differential gene expression analysis comparing insulin resistant and insulin sensitive phenotypes corrected for sex, BMI and age.
Any help is much appreciated!
Hi, I have the same problem you faced. Category and gender are categorical variables where as the rest are continuos. This is how I did my design:
design = ~ category + gender + scale(age, center = TRUE) + scale(fatmass, center = TRUE) + scale(bmi, center = TRUE) + scale(fastingGlucose, center = TRUE))
But still the message is coming as "..the design formula contains one or more numeric variables that have mean or standard deviation larger than 5 (an arbitrary threshold to trigger this message). it is generally a good idea to center and scale numeric variables in the design to improve GLM convergence."
Therefore, could please help in the following my questions: 1. What is wrong in my design? 2. How and where exactly I can use cut function ? you may use the above design to show how to use cut()? Do you think you can explaine me? Thanks!
Can you instead create new scaled variables instead of using
scale()
directly in the design? This helps DESeq2's warning and error code for checking for design issues.Thanks Michael! The message is not coming anymore!
However, did not see difference in the gene expression level. I mean the number and type of expressed genes are the same in my analysis (scaling could not make my genes to be different in number or type comparing with my DESeq analysis with the same colData but unscaled).
The warning is to help with model fitting. You may obtain the same fit but faster for example, or it may fail with badly scaled covariates.
Hi there :)
I think the main problem comes from not creating new scaled variables. That is the issue I had with the cut variables. Even though I was cutting them, I wasn't creating a new variable. Make sure to add the new variables to your colData! Hope it works!