dealing with outliers (deseq2)
1
0
Entering edit mode
andreia ▴ 10
@andreia-23745
Last seen 2.6 years ago
Portugal

Hi there, I have been stuck with this analysis, that in first place seems a quite simple experiment but the results that gave me are weird and i dont know how to resolve this. Its not my first time that i have problems with outliers that the DESeq2 not filter, i tried have a solid answer for my problem, but with no sucess.
So the experiment, or more likely the model that i use is ~RIN+group. I know about the continuous variable the automatic filtering is off (ref: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#approach-to-count-outliers) in those cases Love et al recommend to use the assays(dds)[["cooks"]] "perform a manual visualization and filtering if necessary" but i am suck!

My experiment is:

                RIN   group
  DRG-PNI-1     7.7   SN_Injury
  DRG-PNI-2     7.9   SN_Injury
  DRG-PNI-4     6.9   SN_Injury
  DRG-PNI-5     6.8   SN_Injury
  DRG-PNIsh-1   6.9   SN_Sham
  DRG-PNIsh-2   8.2   SN_Sham
  DRG-PNIsh-3   7     SN_Sham
  DRG-PNIsh-4   8.2   SN_Sham
  MN-PNI-2      8.7   MN_Injury
  MN-PNI-3      8     MN_Injury 
  MN-PNI-4      7.5   MN_Injury
  MN-PNI-5      8.1   MN_Injury
  MN-PNIsh-1    7.2   MN_Sham
  MN-PNIsh-2    7.2   MN_Sham
  MN-PNIsh-3    7.4   MN_Sham

My code is:

Code should be placed in three backticks as shown below


data$group<-relevel(data$group, ref="SN_Sham")
dds<-DESeqDataSetFromMatrix(countData = table, colData = data, design= ~RIN + group)
dds$RIN <- dds$RIN / sd(dds$RIN) 
dds<-estimateSizeFactors(dds, controlGenes=index)
keep <- rowSums(counts(dds) >= 10) >= 4 
ddsClean <- dds[keep,]
ddsClean <- estimateDispersions(ddsClean)
dds <- nbinomWaldTest(ddsClean, maxit=10000)
#table(mcols(dds)$betaConv)
keep2<-which(mcols(dds)$betaConv=="TRUE")
dds<-dds[keep2,]
dds

res <- results(dds, name=c("group_SN_Sham_vs_MN_Sham"), pAdjustMethod="fdr", alpha=0.05)

In this comparison, i have 2 genes that are considered significative:

  log2 fold change (MLE): group SN Injury vs SN Sham 
  Wald test p-value: group SN Injury vs SN Sham 
                                                                   baseMean log2FoldChange     lfcSE      stat     pvalue      padj
                                                                    <numeric>      <numeric> <numeric> <numeric>  <numeric> <numeric>
                       ENSRNOT00000007365|ENSRNOG00000005286|Coch   13.6013       -11.7896   3.68177  -3.20215 0.00136407 0.0342551
                       ENSRNOT00000019133|ENSRNOG00000014233|Krt19   21.8576       -15.3058   3.79164  -4.03673 5.42012e-05 0.00283855

But looking for the matrix of counts all samples from the this comparison have zero counts:

                                                 DRG-PNI-1 DRG-PNI-2 DRG-PNI-4 DRG-PNI-5 DRG-PNIsh-1 DRG-PNIsh-2 DRG-PNIsh-3 DRG-PNIsh-4 MN-PNI-2 MN-PNI-3 MN-PNI-4 MN-PNI-5 MN-PNIsh-1 MN-PNIsh-2 MN-PNIsh-3
 ENSRNOT00000007365|ENSRNOG00000005286|Coch         0         0         0         0           0           0           0           0      109       13       10        7          0         0        289                                  
 ENSRNOT00000019133|ENSRNOG00000014233|Krt19        0         0         0         0           0           0           0           0       14      195      138       50          0         0        246

I will be very appreciate if anyone help me.

Thanks in advance, Andreia

maxcooks outliers zero counts DESeq2 • 1.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

You may have accidentally mismapped the colData and the samples. Best way to check is plotCounts for these genes (anyway, I always recommend looking at plotCounts of example genes).

ADD COMMENT
0
Entering edit mode

Thanks Michael for your reply.

I checked the colData and everything is ok.

I tested the approach from an older post [DESeq2] Cook's distance flagging for continuous variables and i couldnt get any idea which was the best threshold to apply, so i end up after talking with my boss that i will filtering manually after the results() the genes that have rowsums < 5 (for the comparison samples) and remove the genes that have counts in only one sample.

Not very secure of this approach, but i really need to move on...

ADD REPLY
0
Entering edit mode

So when you make plotCounts, what do you see? DESeq2 will always give you a 0 LFC for a gene when two groups that have 0, unless there is a sample assignment bug. What version of DESeq2 do you have, see sessionInfo().

ADD REPLY

Login before adding your answer.

Traffic: 1054 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6