Question

Test whether expression levels of two gene sets are significantly different

0

Entering edit mode

Jon Bråte ▴ 260

@jon-brate-6263

Last seen 4 months ago

Norway

Hi,

I have a set of non coding genes and a set of coding genes expressed across a number of samples. Based on the boxplots it is obvious that the lncRNAs are expressed lower than coding genes (log2 non-normallized counts). But what is the appropriate test for this kind of data, and is such a test implemented in e.g. DESeq2?

http://imgur.com/a/Xd9jT

Thanks, Jon

R DESeq2 gene set testing gene expression • 2.6k views

ADD COMMENT • link 8.0 years ago Jon Bråte ▴ 260

score 3 · Answer 1 · 2016-11-11

3

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

In my opinion, there is no rigorous statistical test available. Any obvious test, such a t-test or Wilcoxon test, assumes that the genes are independent whereas they are not. The dependence structure between the genes is unknown and, if it could be incorporated into the test, might have a dramatic effect on the computed p-value.

Since the result is so obvious, do you need a test?

ADD COMMENT • link 8.0 years ago Gordon Smyth 51k

0

Entering edit mode

Thanks for this point Gordon, I hadn't thought of that.

ADD REPLY • link 8.0 years ago chris86 ▴ 420

score 1 · Answer 2 · 2016-11-12

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 20 months ago

United States

I don't know — whenever someone asks me to say something about the relative expression of different genes within the same samples, I never feel comfortable doing it ... in fact, I'll fight tooth and nail against it for the following reasons.

First, and perhaps most importantly: what does this comparison tell you? In the scenario, if you have fewer amounts (in TPM(!)) of lncRNAs than mRNAs, does that mean the lncRNAs have less impact/activity than mRNAs? Or if you have less mRNA for gene A than gene B, are "you" going to then assert that gene A has less functional influence on the cell than gene B?

If I offered to give you a "pinch" of uranium or a pound of salt, I bet you'd take the pound of salt.

Second: do we really think length normalization is all we need? Perhaps the results are out there (so please point me to those references if so) but are there convincing publications out there that show accuracy of TPMs to something like molar quantities of transcripts for a large number of transcripts?

If you want to know how the relative abundance of genes change across conditions (ie. ratio of gene A:gene B in condition vs condition 2), then fine: we're good ;-)

ADD COMMENT • link 8.0 years ago Steve Lianoglou ★ 13k

2

Entering edit mode

I don't disagree with you, but I did want to point out modern methods do much more than just length normalization to produce TPMs, e.g. almost all correct for size selection, and many correct for biases from degradation, random hexamer priming, amplification, and sequencing errors.

ADD REPLY • link 8.0 years ago Michael Love 42k

0

Entering edit mode

Thanks for pointing that out, Mike ... could you point towards a good rabbit hole to start digging into this topic "a new"?

I reckon starting with your alpine paper and digging through the refs from there eventually will get me to see the light?

ADD REPLY • link 8.0 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

If you want to know more about the bias-correction methods, I think you might have to read the docs & source code of Kallisto, Salmon, et. al. They have both added lots of bias-correction features fairly recently, so I don't think there's a publication covering all or even most of them.

ADD REPLY • link 8.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I can speak to the Salmon part. The latest bias correction methods for Salmon are described in the updated bioRxiv print:

http://biorxiv.org/content/early/2016/08/30/021592

This includes a fragment sequence bias correction similar to that described in alpine. The Salmon implementation then is preferred over alpine for producing transcript quantifications, as it is substantially faster and probabilistically allocates reads which map to multiple genomics locations.

ADD REPLY • link 8.0 years ago Michael Love 42k

0

Entering edit mode

The alpine pre-print is now published in NBT, I wrote up a blog post describing the overall point:

Also see the updated Salmon manuscript linked below as to how this is implemented in the latest version of Salmon.

ADD REPLY • link 8.0 years ago Michael Love 42k

score 0 · Answer 3 · 2016-11-11

0

Entering edit mode

chris86 ▴ 420

@chris86-8408

Last seen 4.9 years ago

UCL, United Kingdom

I'd take TPM values from RSEM or EXPRESS (accurate quantification of transcript abundance) then simply do a t-test or non parametric equivalent on the TPMs between the lncRNAs and mRNAs.

DESEQ does not deal with changes in transcript abundance, but in fold changes between conditions.

ADD COMMENT • link 8.0 years ago chris86 ▴ 420

1

Entering edit mode

Yes, I agree with Chris. I would do a non parametric test (e.g. wilcox.test) of TPMs from RSEM, eXpress, or the faster methods: Salmon, Sailfish, kallisto.

For others reading the thread who don't see why not to use counts here: the reason is that this question concerns testing average expression across *genes*, not expression across samples in different conditions.

ADD REPLY • link 8.0 years ago Michael Love 42k

0

Entering edit mode

Thanks! Could I also do a t-test using normalized counts from Deseq? These should take into account differences in sequencing depth, so should be similar to TPMs? The two gene sets are from the same sequencing libraries by the way.

ADD REPLY • link 8.0 years ago Jon Bråte ▴ 260

0

Entering edit mode

No because they don't take into account length

ADD REPLY • link 8.0 years ago chris86 ▴ 420

score 0 · Answer 4 · 2016-11-16

0

Entering edit mode

Jon Bråte ▴ 260

@jon-brate-6263

Last seen 4 months ago

Norway

Hi,

Thank you all for valuable and interesting comments. I learned a lot from this.

Jon

ADD COMMENT • link 8.0 years ago Jon Bråte ▴ 260