Question

Inconsistent RNAseq data in knockdown expt

0

Entering edit mode

csijst • 0

@csijst-15102

Last seen 6.2 years ago

Singapore/National University of Singap…

Hi,

I am helping a colleague to conduct a differential expression analysis with RNAseq data but I have some concerns about the expression levels stated in the analysis. Based on the design of the experiment, my colleague states that protein A controls the stability of protein B; when prot A is reduced, prot B increases.

In the benchwork, my colleague used an shRNA against prot A and prot B (independently) and saw a significant reduction in the expressions (both western and qRT-PCR); I believe the shRNA targets the mRNA levels of the protein. Basically, the bench work was validated.

She conducted the same experiment and sent the samples for RNA sequencing. Prior to the library preparation, the samples were subjected to rRNA depletion. When the datasets came back, I aligned them with STAR alignment, and processed them with Rsubread and DESeq2; I check the padj values for significance. I found two strange findings - (1) shRNA A was able to significantly reduce prot A, but prot B was also reduced slightly (not significantly though), and (2) shRNA B was not able to significantly reduce prot B.

I checked the PCA plots and they seemed alright; consistent patterns and clear distinguishing features between batch and treatment.

Here are my questions - is it common to find an shRNA significantly reduce during benchwork, but RNAseq data not able to detect the difference? Is it then acceptable to take the results as it is, and use it for publication? Because our concern is that the reviewers will question "why would we accept the data when we used an shRNA, and not see significant reduction in the RNAseq datasets"? Would it now be mandatory for us to repeat the experiment to get the proper readouts? Is there a way for me to check in the genome browser (or any programs for that matter) to see where the RNAseq datasets have gone wrong? Usually RNA sequencing does 30 million reads. Would 30 million reads be sufficient to encompass the whole library?

deseq2 • 1.5k views

ADD COMMENT • link updated 7.1 years ago by Peter Langfelder ★ 3.0k • written 7.1 years ago by csijst • 0

0

Entering edit mode

Dear Dr Michael,

My colleague and I didn't use plotCounts() but we checked the counts using counts['gene',].

So what you mean is (as far as you can understand) the data is true and it's probably not something related to the programming? I have some suspicion about the benchwork though. I do appreciate you giving me advice thus far.

Yes, I will make plans to post there soon, and check with my colleague whether any validation was done for the RNA levels prior to sending it for RNA sequencing.

Thank you once again.

Regards,

Johann

ADD REPLY • link 7.1 years ago csijst • 0

score 0 · Answer 1 · 2018-04-04

Did you look at plotCounts()?

Beyond that, if everything looks correct as far as the results and the counts for these specific genes, then I don't have any further advice to give as far as DESeq2 is concerned (where I'm obligated to provide software support for posts that have the deseq2 tag).

You might consider posting to biostars or seqanswers about people's experience with shRNA and RNA-seq.

score 0 · Answer 2 · 2018-04-04

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 6 months ago

United States

When you check DE of a single mRNA or two, you don't need to look at padj; the unadjusted p-value (and perhaps fold change) is appropriate in this case. (FDR is appropriate when you you test whether any of a large number of tests are significant.)

ADD COMMENT • link 7.1 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Hi Dr Peter,

I am not that familiar with statistics, but do you mind explaining what do you mean by "larger number of tests"? Do you mean if I am interested in gene A (as compared to controls) but also interested whether gene B (or others) has a correlated change as stated in the log2foldchange (at the same moment), then I look at the p-adjusted values?

And if I just want to see whether gene A (as compared to the controls) is having a significant change, I just look at the unadjusted p-values? What if I am looking at multiple genes (e.g. gene A - treated vs untreated; gene B - treated vs untreated; gene C - treated vs untreated; ...), do I look at unadjusted or adjusted p-values, and why?

If I am understanding correctly from this article on RNAseq DEA (page 20), it mentions that when I refine the alpha to 0.05 from the default 0.1, I would need to look at the padj values. Also, wouldn't the purpose of DESeq calculating the padj values is to check whether your gene is knocked down, and also (maybe) seeing a change in the expression in other downstream targeted genes? In which, this would require me to see the padj values nontheless?

Is there any clearer examples of how I should just use unadjusted p-values as to adjusted?

Thank you.

Regards,

Johann

ADD REPLY • link 7.1 years ago csijst • 0