Question

Outlier removal outcome

0

Entering edit mode

sanchita.bhattacharya • 0

@aab94592

Last seen 2.1 years ago

United States

Hi,

I have two conditions (A, B) and two levels for each condition (Positive, Negative). Each group has ~4-5 biological replicates (Condition + Positive/ Negative). Based on exploratory analyses, I identified three outliers (A1Positive, B1Negative, A2Negative) which were removed from the downstream DESEQ2 analyses.

I ran DESeq 2 twice-

1) removing only A1Positive sample
2) removing three outliers (A1Positive, B1Negative, A2Negative)

I exported the DE results for the following contrasts:

1) APositive vs BPositive (excluding A1Positive outlier sample from the DE analyses)
2) APositive vs BPositive (excluding A1Positive, B1Negative, A2Negative outliers)

While comparing the results from 1) and 2) - I noticed that genes had the same log2FC and baseMean in both the files but lfcSE, Stats, pvalue and adj p-value differed. Note: The only difference here was the number of outliers removed from 1) and 2).

Can someone explain why the values (pval,adj pval, lfcSE) in contrast between APositive vs BPositive group differ in 1) and 2). Does removing A1Negative and B1Negative outliers will impact the contrasts for APositive vs BPositive group.

Any insight would be helpful.

Thanks!

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

Outlier • 664 views

ADD COMMENT • link updated 2.1 years ago by ATpoint ★ 4.8k • written 2.1 years ago by sanchita.bhattacharya • 0

score 0 · Answer 1 · 2023-03-18

The normalization and model parameters are estimated from all samples regardless of contrast, so adding and removing samples will alter the stats. This alteration can be large or small, depending on how similar and dissimilar these samples compare to the other samples of the experimental groups. So yes, removing and adding samples can alter stats of the groups even if you did not remove samples of the actual contrasted groups.