I am running DESeq using 3 different designs for the same set of data. I have 157 human samples (RNAseq) and I am performing differential gene expression analysis comparing 2 phenotypes (insulin resistant vs insulin sensitive). For 2 of the 3 designs, deseq runs smoothly but for the 3rd summary(res) reports thousands of outliers. I have followed the instructions of the documentation and the posts from this forum and I have set DESeq with minReplicatesForReplace=Inf and results with cooksCutoff=FALSE. I am positive my dataset doesn't have outliers and I would like to understand why for 2 of the 3 models deseq runs without problems and for the 3rd the method for flagging outliers is not appropriate for the distribution of counts in my data and should be turned off ?
- model 1: corrects for sex, BMI and age
- model 2: corrects for sex, BMI ,age and differences in cell type composition
- model 3: corrects for sex, BMI ,age, lipid & glucose lowering medication and differences in cell type composition
Models 2 & 3 run without any errors. Model 1 reported thousands of outliers (before I turned it off). Could someone explain to me why? I understand that each model corrects for different things obviously and the designs are the not the same. I consider model 1 a simple (classical) design and I was quite frankly surprised that the method for flagging out outliers was not appropriate for that design but it is for the other 2.
That is clear! I get it now. Thank you.