Count outliers differs per design will running DESeq
1
0
Entering edit mode
MiKappa ▴ 30
@mikappa-23113
Last seen 4.7 years ago

I am running DESeq using 3 different designs for the same set of data. I have 157 human samples (RNAseq) and I am performing differential gene expression analysis comparing 2 phenotypes (insulin resistant vs insulin sensitive). For 2 of the 3 designs, deseq runs smoothly but for the 3rd summary(res) reports thousands of outliers. I have followed the instructions of the documentation and the posts from this forum and I have set DESeq with minReplicatesForReplace=Inf and results with cooksCutoff=FALSE. I am positive my dataset doesn't have outliers and I would like to understand why for 2 of the 3 models deseq runs without problems and for the 3rd the method for flagging outliers is not appropriate for the distribution of counts in my data and should be turned off ?

  • model 1: corrects for sex, BMI and age
  • model 2: corrects for sex, BMI ,age and differences in cell type composition
  • model 3: corrects for sex, BMI ,age, lipid & glucose lowering medication and differences in cell type composition

Models 2 & 3 run without any errors. Model 1 reported thousands of outliers (before I turned it off). Could someone explain to me why? I understand that each model corrects for different things obviously and the designs are the not the same. I consider model 1 a simple (classical) design and I was quite frankly surprised that the method for flagging out outliers was not appropriate for that design but it is for the other 2.

deseq2 • 486 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States

This is expected. The criterion we use for outlier flagging (see 2014 paper) is how much the observation affects the coefficient vector. As the coefficients are defined by the design you can see how you get different results. Also you can imagine a simple case where a batch with a large batch effect has a single sample. Including the batch covariate explains the deviation of the sample, but without the batch covariate, it would greatly affect the other LFC and be flagged with a large Cook’s distance.

ADD COMMENT
0
Entering edit mode

That is clear! I get it now. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1101 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6