Question

DESeq2 influence of 2 samples

0

Entering edit mode

gregory.l.stone ▴ 10

@gregorylstone-12225

Last seen 6.5 years ago

I am having trouble figuring out why 2 sets of paired samples are affecting my data so severely. I have ~100 samples, each paired (so ~200 sets of count files). I aligned my fastq files with STAR, counted genes with featureCounts, and am using DESeq2 for differential expression analysis. My design is ~sex + sex:nested + sex:condition, where nested is the pairing factor.

At first pass, I get 716 significant genes (padj < 0.05), yet many genes have count plots like this: http://i.imgur.com/2Yrd410.png

The seemingly outlier sample is present in many of the genes, so I removed the pre and post counts for that sample.

I then got 18 sig genes, most of which look like this: http://i.imgur.com/wGE53DH.png

The seemingly outlier sample is present in all of the 18 genes, so I removed the pre and post counts for that sample. Now I get no significant genes. It seems strange that 2 samples would have such a large effect on an otherwise large data set. I would appreciate any ideas and guidance as to how best proceed! Thank you!

deseq2 outliers differential gene expression • 2.1k views

ADD COMMENT • link updated 8.2 years ago by Michael Love 43k • written 8.2 years ago by gregory.l.stone ▴ 10

0

Entering edit mode

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] IHW_1.2.0                  DESeq2_1.14.1
[3] SummarizedExperiment_1.4.0 Biobase_2.34.0
[5] GenomicRanges_1.24.3       GenomeInfoDb_1.10.2
[7] IRanges_2.8.1              S4Vectors_0.12.1
[9] BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8          RColorBrewer_1.1-2   plyr_1.8.4
[4] XVector_0.14.0       bitops_1.0-6         tools_3.3.1
[7] zlibbioc_1.20.0      digest_0.6.12        rpart_4.1-10
[10] memoise_1.0.0        RSQLite_1.1-2        annotate_1.52.1
[13] tibble_1.2           gtable_0.2.0         lattice_0.20-34
[16] Matrix_1.2-7.1       lpsymphony_1.2.0     DBI_0.5-1
[19] gridExtra_2.2.1      genefilter_1.56.0    cluster_2.0.5
[22] locfit_1.5-9.1       grid_3.3.1           nnet_7.3-12
[25] data.table_1.10.2    AnnotationDbi_1.36.2 fdrtool_1.2.15
[28] XML_3.98-1.5         survival_2.39-5      BiocParallel_1.8.1
[31] foreign_0.8-67       latticeExtra_0.6-28 Formula_1.2-1
[34] geneplotter_1.52.0   ggplot2_2.2.1        Hmisc_3.17-4
[37] scales_0.4.1         splines_3.3.1        assertthat_0.1
[40] colorspace_1.3-1     xtable_1.8-2         acepack_1.4.1
[43] RCurl_1.95-4.8       lazyeval_0.2.0       munsell_0.4.3
[46] slam_0.1-40

ADD REPLY • link 8.2 years ago gregory.l.stone ▴ 10

score 0 · Answer 1 · 2017-02-03

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

Can you post your full code, I want to make sure it's not something going wrong with the construction of the model matrix.

ADD COMMENT • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Yes of course. Here is my code:

library(DESeq2)

library(IHW)

samples <- read.csv(sampleTable.csv)

sampleTable <- data.frame(sample = samples$PID, fileName = samples$featureCounts_PID, sex = samples$Sex, condition = samples$Condition, nested = samples$Nested)

sampleTable$sex <- relevel(sampleTable$sex, ref = "Male")
sampleTable$condition <- relevel(sampleTable$condition, ref = "pre")

#even though counts are from featureCounts, formatted results to be same as HTSeq-count results to use following import method

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ 1)

model = model.matrix(~ sex + sex:nested + sex:condition, colData(dds))

apply(model, 2, function(x) all(x == 0))

results_source <- DESeq(dds, full = model, betaPrior = FALSE)

sex_diff = results(results_source, contrast = list("sexFemale.conditionpost", "sexMale.conditionpost"))

Thank you for the help!

ADD REPLY • link 8.2 years ago gregory.l.stone ▴ 10

0

Entering edit mode

What is the output of the apply() call?

Please post the code where you remove samples and rerun DESeq2 also.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

The code is the same for when I remove samples. I just modify what csv file I read in.

The output of the apply() call is as follows:

(Intercept)               sexFemale          sexMale:nested        sexFemale:nested   sexMale:conditionpost
                  FALSE                   FALSE                   FALSE                   FALSE                   FALSE
sexFemale:conditionpost
                  FALSE

Thanks!

ADD REPLY • link 8.2 years ago gregory.l.stone ▴ 10

0

Entering edit mode

And you modify the count matrix as well?

Just want to make sure you're running a comparable analysis.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Yes, I do modify the count matrix as well

ADD REPLY • link 8.2 years ago gregory.l.stone ▴ 10

0

Entering edit mode

I finally can see what's going wrong, since you posted this output.

It's a common R problem: you are coding the patient as a numeric vector instead of as a factor (categorical). So patient 1 + patient 2 = patient 3, etc.

After coding the patient as a factor, you will get ~100 of coefficients in the model.matrix for the patients, which is the point: you want to control for all the patient differences, and to estimate and test the sex-specific treatment effect.

ADD REPLY • link 8.2 years ago Michael Love 43k

0

Entering edit mode

Wow I'm dumb. Of course it would be something like that! Thank you so much for the help, and for being so generous with your time!

ADD REPLY • link 8.2 years ago gregory.l.stone ▴ 10