Question

Ballgown with replicates and the adjustvars option

0

Entering edit mode

o.diogosilva • 0

@odiogosilva-13773

Last seen 7.5 years ago

Dear all,

I am using ballgown to quantify differential gene expression in a data set after performing the hisat2+stringtie pipeline. I have read other questions about how to handle technical replicates in DE analyses but I still have some doubts on how to handle them, and the purpose of the adjustvars option in ballgown. My phenotype file is as follows:

"ids", "hpi", "exp", "rep" "CoffeR1C24", 24, "C", 1 "CoffeR1C48", 48, "C", 2 "CoffeR1C72", 72, "C", 3 "CoffeR1Q24", 24, "Q", 4 "CoffeR1Q48", 48, "Q", 5 "CoffeR1Q72", 72, "Q", 6 "CoffeR2C24", 24, "C", 1 "CoffeR2C48", 48, "C", 2 "CoffeR2C72", 72, "C", 3 "CoffeR2Q24", 24, "Q", 4 "CoffeR2Q48", 48, "Q", 5 "CoffeR2Q72", 72, "Q", 6 "CoffeS1C24", 24, "C", 7 "CoffeS1C48", 48, "C", 8 "CoffeS1C72", 72, "C", 9 "CoffeS1Q24", 24, "Q", 10 "CoffeS1Q48", 48, "Q", 11 "CoffeS1Q72", 72, "Q", 12 "CoffeS2C24", 24, "C", 7 "CoffeS2C48", 48, "C", 8 "CoffeS2C72", 72, "C", 9 "CoffeS2Q24", 24, "Q", 10 "CoffeS2Q48", 48, "Q", 11 "CoffeS2Q72", 72, "Q", 12

I want to test for deferentially expressed genes between the two "exp" conditions ('C' and 'Q'). Each sample has a technical replicate ("rep" column) and the experimental conditions were assessed at multiple times ("hpi" column).

From what I have gathered from other questions, the best way of dealing with technical replicates would be average the expression measurements across replicates, is that correct?

And since each sample also contains expression results at 24, 48 and 72h, should this be considered a potential confounding factor when testing between the two experimental conditions, and therefore specified in the adjustvars option? Or is this a major confusion?

Thank you very much for your time and attention.

Cheers,

Diogo

ballgown rnaseq differential gene expression • 2.1k views

ADD COMMENT • link updated 7.3 years ago by Alyssa Frazee ▴ 210 • written 7.5 years ago by o.diogosilva • 0

score 0 · Answer 1 · 2017-11-11

It seems like what you are looking to do here is determine the differential expression between "C" and "Q", adjusting for everything else. I would convert your phenodata file to something like:

"ids", "hpi", "exp", "rep", "sample" "CoffeR1C24", 24, "C", 1, "CoffeR1" "CoffeR1C48", 48, "C", 2, "CoffeR1", "CoffeR1C72", 72, "C", 3, "CoffeR1", "CoffeR1Q24", 24, "Q", 4, "CoffeR1", "CoffeR1Q48", 48, "Q", 5, "CoffeR1", "CoffeR1Q72", 72, "Q", 6, "CoffeR1", "CoffeR2C24", 24, "C", 1, "CoffeR2", "CoffeR2C48", 48, "C", 2, "CoffeR2",
"CoffeR2C72", 72, "C", 3, "CoffeR2", "CoffeR2Q24", 24, "Q", 4, "CoffeR2", "CoffeR2Q48", 48, "Q", 5, "CoffeR2", "CoffeR2Q72", 72, "Q", 6, "CoffeR2", "CoffeS1C24", 24, "C", 7, "CoffeS1", "CoffeS1C48", 48, "C", 8, "CoffeS1", "CoffeS1C72", 72, "C", 9, "CoffeS1", "CoffeS1Q24", 24, "Q", 10, "CoffeS1", "CoffeS1Q48", 48, "Q", 11, "CoffeS1", "CoffeS1Q72", 72, "Q", 12, "CoffeS1", "CoffeS2C24", 24, "C", 7, "CoffeS2", "CoffeS2C48", 48, "C", 8, "CoffeS2", "CoffeS2C72", 72, "C", 9, "CoffeS2", "CoffeS2Q24", 24, "Q", 10, "CoffeS2", "CoffeS2Q48", 48, "Q", 11, "CoffeS2", "CoffeS2Q72", 72, "Q", 12, "CoffeS2"

Then I would do the test like this:

results = stattest(bg, covariate="exp", adjustvars="sample")

This will give you the average change between C & Q for the same sample (where the average is across time points and tech reps).

Hope this is helpful in getting started with your analysis!