Question

Same few genes repeated in DESeq2 Results

0

Entering edit mode

dsambo ▴ 10

@8678f435

Last seen 20 months ago

United States

I have a pilot study with 3-4 samples per 4 groups (2 pretreatments and 2 treatments, 4 combinations). I performed Ampliseq on these samples.

I ran the DESeq2 pipeline, filtering out genes with counts of less than 5. For the DESeq results I ran:

design(dds) <- formula(~ Group)
dds <- DESeq(dds)

res1 <- results(dds, contrast=c("Group","ConEtOH","ConSal"))
res2 <- results(dds, contrast=c("Group","ABxEtOH","ABxSal"))

summary(res1)
summary(res2)

The results I get are:

> summary(res1)

out of 13061 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 691, 5.3%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> summary(res2)

out of 13061 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

When looking at the results for the 1st comparison, it is a handful of genes repeated around 200 times with the exact same baseMean, log2FC, etc for each (example of gene below)

0610009O20Rik 0610009O20Rik.1 0610009O20Rik.10 0610009O20Rik.100 0610009O20Rik.101 0610009O20Rik.102 0610009O20Rik.103 0610009O20Rik.104 0610009O20Rik.105

This does not seem right to me. Any ideas on why these genes are being overrepresented and repeated in my results so many times?

DESeq2 • 1.8k views

ADD COMMENT • link 21 months ago dsambo ▴ 10

0

Entering edit mode

What is the output of head(rownames(dds)) and sum(duplicated(rownames(dds)))?

ADD REPLY • link 21 months ago ATpoint ★ 4.8k

0

Entering edit mode

Thanks for suggesting to look at this! Below are the results. The dds object has 13061 elements, and it's looking like 13047 are being picked up as duplicates.

head(rownames(dds))

[1] "0610031O16Rik" "0610037L13Rik" "0610037L13Rik" "0610009E02Rik" "0610009O20Rik" "0610037L13Rik"

sum(duplicated(rownames(dds)))

[1] 13047

When I check for duplicates in my count data I inputted, no duplicates are found.

sum(duplicated(rownames(cts)))

[1] 0

Thoughts on this?

ADD REPLY • link 21 months ago dsambo ▴ 10

0

Entering edit mode

Please show full code, especially how you make the dds.

ADD REPLY • link 21 months ago ATpoint ★ 4.8k

0

Entering edit mode

It appears the duplication issue appears when I filter genes! The lines of code I use to filter are the same I've used with previous analyses with no issue. Is there something I need to alter?

cts <- read.delim("Counts1.txt", row.names="Gene")

sum(duplicated(rownames(cts)))

[1] 0

coldata <- read.delim("SampleInfo.txt", row.names="Sample", stringsAsFactors = FALSE)

coldata <- coldata[,c("Pretreatment","Treatment", "Group")]

coldata$Pretreatment <- factor(coldata$Pretreatment)

coldata$Treatment <- factor(coldata$Treatment)

coldata$Group <- factor(coldata$Group)

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ 1)

dds

class: DESeqDataSet dim: 23930 14 metadata(1): version assays(1): counts rownames(23930): 0610005C13Rik 0610006L08Rik ... Zzef1 Zzz3 rowData names(0): colnames(14): CS1 CS2 ... AE3 AE4 colData names(3): Pretreatment Treatment Group

sum(duplicated(rownames(dds)))

[1] 0

keep <- rowSums(counts(dds) >= 5)

dds <- dds[keep,]

dds

class: DESeqDataSet dim: 13061 14 metadata(1): version assays(1): counts rownames(13061): 0610031O16Rik 0610037L13Rik ... 0610037L13Rik 0610037L13Rik rowData names(0): colnames(14): CS1 CS2 ... AE3 AE4 colData names(3): Pretreatment Treatment Group

sum(duplicated(rownames(dds)))

[1] 13047

ADD REPLY • link 21 months ago dsambo ▴ 10

score 0 · Answer 1 · 2023-07-27

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 day ago

Germany

keep <- rowSums(counts(dds) >= 5)

Check this. What this does is to calculate how many samples per row have more than 5 counts. This introduces the duplication since the output is simply a numeric vector. See the vignette on recommended prefilters and how to implement it, or use filterByExpr from edgeR.

ADD COMMENT • link 21 months ago ATpoint ★ 4.8k

0

Entering edit mode

When I specified in that line of code the "smallest group size", the duplication issue went away!

keep <- rowSums(counts(dds) >= 5) >= 3

dds <- dds[keep,]

sum(duplicated(rownames(dds)))

[1] 0

ADD REPLY • link 21 months ago dsambo ▴ 10