Hello,
I am analyzing my results using DESeq2.
I have 3 conditions (0, 1 and 2 mL).
And I used this code to do the analys:
> library("DESeq2") > setwd("~/Documents/Trihar1") > directory<-'/home/harumi/Documents/Trihar1' > sampleFiles<-c("file1.counts","file2.counts","file3.counts","file4.counts","file6.counts", "file7.counts", "file8.counts") > sampleCondition = c("0", "0", "0", "1", "1", "2", "2") > sampleName = c("CdC1", "CdC2", "CdC3", "Cd1_1", "Cd1_3", "Cd2_1", "Cd2_2") > sampleTable = data.frame(sampleName=sampleName, fileName=sampleFiles, condition=sampleCondition) > ddsHTSeq = DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design= ~ condition) > colData(ddsHTSeq)$condition=factor(colData(ddsHTSeq)$condition, levels=c("0", "1", "2")) > dds<-DESeq(ddsHTSeq) > alpha<-0.05 > res1<- results(dds, contrast=c("condition", "1", "0"), alpha=alpha) > res2<- results(dds, contrast=c("condition", "2", "0"), alpha=alpha) > res2to1<- results(dds, contrast=c("condition", "2", "1"), alpha=alpha) > write.csv(as.data.frame(res1),file='expression_Cd1.csv') > write.csv(as.data.frame(res2),file='expression_Cd2.csv') > write.csv(as.data.frame(res2to1),file='expression_Cd2to1.csv')
All my tables contain the same number of genes: 13932
I read in vignette about alpha, but I think I didn't understand well.
Does alpha filtrate my results using pvalue?
"Note that the results function automatically performs independent filtering based on the mean of normalized counts for each gene, optimizing the number of genes which will have an adjusted p value below a given FDR cutoff, alpha
. Independent filtering is further discussed below. By default the argument alpha
is set to 0.10.1. If the adjusted p value cutoff will be a value other than 0.10.1, alpha
should be set to that value:
res05 <- results(dds, alpha=0.05)"
I am confused about the same number of genes, also I read somewhere I should filtrate using padj, like this:
> res0.05 <- res1[which(res1$padj < 0.05), ] > res0.05_2 <- res2[which(res1$padj < 0.05), ] > res0.05_2to1 <- res2to1[which(res1$padj < 0.05), ] > write.csv(as.data.frame(res0.05),file='cd1DEG.csv') > write.csv(as.data.frame(res0.05_2),file='cd2DEG.csv') > write.csv(as.data.frame(res0.05_2to1),file='cd2to1DEG.csv')
Now all my tables contain 3985 genes.
In summary it says:
out of 3985 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2163, 54%
LFC < 0 (down) : 1822, 46%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> summary(res0.05_2)
out of 3985 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1429, 36%
LFC < 0 (down) : 1179, 30%
outliers [1] : 0, 0%
low counts [2] : 2, 0.05%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> summary(res0.05_2to1)
out of 3985 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 6, 0.15%
LFC < 0 (down) : 114, 2.9%
outliers [1] : 0, 0%
low counts [2] : 199, 5%
(mean count < 12)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
I would like to obtain only those up and down regulated genes, not all 3985 genes.
What should I do?
It was correct to use alpha and then filtrate using padj?
Thanks
!!
I din't realize this before.
Thanks!