I have a pilot study with 3-4 samples per 4 groups (2 pretreatments and 2 treatments, 4 combinations). I performed Ampliseq on these samples.
I ran the DESeq2 pipeline, filtering out genes with counts of less than 5. For the DESeq results I ran:
design(dds) <- formula(~ Group)
dds <- DESeq(dds)
res1 <- results(dds, contrast=c("Group","ConEtOH","ConSal"))
res2 <- results(dds, contrast=c("Group","ABxEtOH","ABxSal"))
summary(res1)
summary(res2)
The results I get are:
> summary(res1)
out of 13061 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 0, 0%
LFC < 0 (down) : 691, 5.3%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> summary(res2)
out of 13061 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 0, 0%
LFC < 0 (down) : 0, 0%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
When looking at the results for the 1st comparison, it is a handful of genes repeated around 200 times with the exact same baseMean, log2FC, etc for each (example of gene below)
0610009O20Rik 0610009O20Rik.1 0610009O20Rik.10 0610009O20Rik.100 0610009O20Rik.101 0610009O20Rik.102 0610009O20Rik.103 0610009O20Rik.104 0610009O20Rik.105
This does not seem right to me. Any ideas on why these genes are being overrepresented and repeated in my results so many times?
What is the output of
head(rownames(dds))
andsum(duplicated(rownames(dds)))
?Thanks for suggesting to look at this! Below are the results. The dds object has 13061 elements, and it's looking like 13047 are being picked up as duplicates.
[1] "0610031O16Rik" "0610037L13Rik" "0610037L13Rik" "0610009E02Rik" "0610009O20Rik" "0610037L13Rik"
[1] 13047
When I check for duplicates in my count data I inputted, no duplicates are found.
[1] 0
Thoughts on this?
Please show full code, especially how you make the dds.
It appears the duplication issue appears when I filter genes! The lines of code I use to filter are the same I've used with previous analyses with no issue. Is there something I need to alter?
[1] 0
class: DESeqDataSet dim: 23930 14 metadata(1): version assays(1): counts rownames(23930): 0610005C13Rik 0610006L08Rik ... Zzef1 Zzz3 rowData names(0): colnames(14): CS1 CS2 ... AE3 AE4 colData names(3): Pretreatment Treatment Group
[1] 0
class: DESeqDataSet dim: 13061 14 metadata(1): version assays(1): counts rownames(13061): 0610031O16Rik 0610037L13Rik ... 0610037L13Rik 0610037L13Rik rowData names(0): colnames(14): CS1 CS2 ... AE3 AE4 colData names(3): Pretreatment Treatment Group
[1] 13047