Question

DESeq2 analysis for targeted RNA-seq

0

Entering edit mode

dadca596 • 0

@8a6484d7

Last seen 2.2 years ago

Australia

Dear all,

I am attempting to perform DESeq2 analysis (using the Geneious plugin) for targeted RNA-seq of several hundred genes. Only the target set of genes is sequenced, as the primer that I use for first strand cDNA synthesis is specific to the target set of genes. I have three replicates for each condition.

Based on the biology of the system that I am working in, I know with a high degree of confidence that only one or a few genes will be differentially expressed (always downregulation) in each treatment. What I have found so far is that it seems to work very nicely for some treatments (only one strongly downregulated gene) but for other treatments I am getting a lot of false positives (around 50-100 or so in total), both downregulated and upregulated genes (yet the PCA plot still shows clustering of replicates along the x-axis... not sure if that is important). I am trying to figure out what is the cause of this inconsistency, and I am quite sure that it is not biological. I think there are several possible causes, one of which might be the DESeq2 analysis, and so I would like to rule that out if I can.

A colleague recently informed me that DESeq2 is not well suited to targeted RNA-seq because DESeq2 is based on a negative binomial distribution. I am told that whole transcriptome gene expression follows this distribution but my limited set of several hundred genes will not.

So my first question is: what is the minimum number of genes required for DESeq2 to work properly? I have seen the discussion here DEseq2 with limited gene set but it's not entirely clear to me what it means for my particular experiment. From what I can gather, it seems like the dispersion estimation should be OK, but I'm not so sure about the size factor calculation. Should I be including a set of control genes or is this not needed considering I know that almost all of the genes will not be differentially expressed?

Also, what would be the best fit type to use in my case?

Finally, how would the answers change in a situation where targeted RNA-seq was performed on only three genes, where it is known in advance that only one of the three genes should be differentially expressed? I actually did this and it worked quite well and gave the expected result for both of the treatments, but I can't square this with the fact that DESeq2 should require more than three genes, to my limited understanding.

Any advice / thoughts would be greatly appreciated.

Thank you.

DifferentialExpression DESeq2 • 2.6k views

ADD COMMENT • link 3.3 years ago • updated 3.2 years ago dadca596 • 0

0

Entering edit mode

Try using this code from EdgeR to see if your data fits the negative binomial model

https://github.com/bioramble/sequencing/blob/master/nb.R

ADD REPLY • link 3.3 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

OK thank you, I will have a look.

ADD REPLY • link 3.2 years ago dadca596 • 0

score 1 · Answer 1 · 2022-01-07

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

You need to specify control genes for targeted sequencing, or else all the LFC will be thrown off -> "am getting a lot of false positives (around 50-100 or so in total), both downregulated and upregulated genes".

There's no hard and fast rule here, you'll have to know the biology of the system. DESeq2 and other similar methods work by assuming that there is a central group of genes that don't change too much one way or the other. If you subset to a very specific set of genes, you need to provide the reference for the log ratios.

ADD COMMENT • link 3.3 years ago Michael Love 43k

0

Entering edit mode

Sorry, just to clarify, I do have a group of genes that won't be differentially expressed. I am looking at several hundred genes, and based on the biology of the system I know that only one or a few genes will be differentially expressed. Therefore, doesn't that mean I already have several hundred control genes? It's not clear to me why I would need to specify control genes in this case. But if I do, I'm not sure how to go about it as I cannot know in advance which gene will be the one that is differentially expressed, and presumably I wouldn't want to unintentionally specify that one as a control gene.

ADD REPLY • link 3.2 years ago dadca596 • 0

0

Entering edit mode

Normally, one makes an MA-plot (log2(baseMean+1) vs logFC) and then the usual picture is that the majority of genes should center somewhat along y=0. That is supposed to be the control genes. Then there is the genes that are likely to be DE which would deviate from that central baseline. It is usually pretty obvious to see that just by eye, can you make such a plot and show it? Just run DESeq() and the normal testing and then make this plot, e.g. plotMA() or some custom plot code for the two columns of the results object.

ADD REPLY • link 3.2 years ago ATpoint ★ 4.8k

0

Entering edit mode

I have done that. As I mentioned in my original post, it works well for some treatments, but for other treatments I am getting false positives.

ADD REPLY • link 3.2 years ago dadca596 • 0

0

Entering edit mode

If you know that only one of several hundreds genes is DE and the rest are null then that shouldn't be a problem for DESeq2.

I would also look at the dispersion plot, that's a key to whether the dataset looks typical with respect to other RNA-seq datasets. There is an example dispersion plot in the vignette.