Dear all,
I am attempting to perform DESeq2 analysis (using the Geneious plugin) for targeted RNA-seq of several hundred genes. Only the target set of genes is sequenced, as the primer that I use for first strand cDNA synthesis is specific to the target set of genes. I have three replicates for each condition.
Based on the biology of the system that I am working in, I know with a high degree of confidence that only one or a few genes will be differentially expressed (always downregulation) in each treatment. What I have found so far is that it seems to work very nicely for some treatments (only one strongly downregulated gene) but for other treatments I am getting a lot of false positives (around 50-100 or so in total), both downregulated and upregulated genes (yet the PCA plot still shows clustering of replicates along the x-axis... not sure if that is important). I am trying to figure out what is the cause of this inconsistency, and I am quite sure that it is not biological. I think there are several possible causes, one of which might be the DESeq2 analysis, and so I would like to rule that out if I can.
A colleague recently informed me that DESeq2 is not well suited to targeted RNA-seq because DESeq2 is based on a negative binomial distribution. I am told that whole transcriptome gene expression follows this distribution but my limited set of several hundred genes will not.
So my first question is: what is the minimum number of genes required for DESeq2 to work properly? I have seen the discussion here DEseq2 with limited gene set but it's not entirely clear to me what it means for my particular experiment. From what I can gather, it seems like the dispersion estimation should be OK, but I'm not so sure about the size factor calculation. Should I be including a set of control genes or is this not needed considering I know that almost all of the genes will not be differentially expressed?
Also, what would be the best fit type to use in my case?
Finally, how would the answers change in a situation where targeted RNA-seq was performed on only three genes, where it is known in advance that only one of the three genes should be differentially expressed? I actually did this and it worked quite well and gave the expected result for both of the treatments, but I can't square this with the fact that DESeq2 should require more than three genes, to my limited understanding.
Any advice / thoughts would be greatly appreciated.
Thank you.
Try using this code from EdgeR to see if your data fits the negative binomial model
https://github.com/bioramble/sequencing/blob/master/nb.R
OK thank you, I will have a look.