I have a project where I have done RNA-seq (paired-end sequencing on Illumina HiSeq) of a worm at different days of development i.e. Ages 0-12. For each age, I have sequenced 3 replicate specimens. I'm completely new to DESeq2 and I was wondering if what I did below is correct.
library(DESeq2)
count_file_names <- grep("counts",list.files("featureCounts"),value=T)
specimen_type <- c("Age0","Age2","Age4","Age6","Age8","Age10","Age12")
sample_information <- data.frame(sampleName = count_file_names, fileName = count_file_names, condition = specimen_type)
Note- I switched to featureCounts instead of HTSeq for generating the counts but I formatted the featureCounts output files to reflect the HTSeq count output format. As far as I know, DESeq2 needs just counts and doesn't discriminate between the two count generating tools
DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = "featureCounts", design = ~condition)
colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c('Age0','Age2','Age4','Age6','Age8','Age10','Age12'))
rld <- rlogTransformation(DESeq_data, blind=T)
Note - I did relevel below based on Age0 even though it doesn't represent my control samples or anything. I wonder if this relevel step can be skipped though?
DESeq_data$condition <- relevel(DESeq_data$condition, "Age0")
DESeq_data <- DESeq(DESeq_data)
Below, I hope to know what genes are up-/down-regulated in Day12 worm specimens when compared to the rest. My real question is whether what I did below is correct/sensible? I plan to do in the same way for the other age specimens too.
Age_12_result <- results(DESeq_data, contrast = c(-1/7,-1/7,-1/7,-1/7,-1/7,-1/7,1/7), pAdjustMethod="BH", lfcThreshold=0, independentFiltering=T)
sig_de_results <- subset(Age_12_result, abs(log2FoldChange)> 1 & padj < 0.05)
sig_de_results <- sig_de_results[order(sig_de_results$padj, decreasing=F),]
My hope is that 'sig_de_results' tells me what genes were up-/down-regulated in Day12 specimens versus the average expression of genes in all of the remainder age specimens. Thanks a zillion in advance!
Note- I had more than 3 replicates in most of my sample age groups (I had stated the replicate number as being 3 in my initial question)