Ballgown with replicates and the adjustvars option
1
0
Entering edit mode
@odiogosilva-13773
Last seen 7.1 years ago

Dear all,

I am using ballgown to quantify differential gene expression in a data set after performing the hisat2+stringtie pipeline. I have read other questions about how to handle technical replicates in DE analyses but I still have some doubts on how to handle them, and the purpose of the adjustvars option in ballgown. My phenotype file is as follows:

"ids", "hpi", "exp", "rep"
"CoffeR1C24", 24, "C", 1
"CoffeR1C48", 48, "C", 2
"CoffeR1C72", 72, "C", 3
"CoffeR1Q24", 24, "Q", 4
"CoffeR1Q48", 48, "Q", 5
"CoffeR1Q72", 72, "Q", 6
"CoffeR2C24", 24, "C", 1
"CoffeR2C48", 48, "C", 2
"CoffeR2C72", 72, "C", 3
"CoffeR2Q24", 24, "Q", 4
"CoffeR2Q48", 48, "Q", 5
"CoffeR2Q72", 72, "Q", 6
"CoffeS1C24", 24, "C", 7
"CoffeS1C48", 48, "C", 8
"CoffeS1C72", 72, "C", 9
"CoffeS1Q24", 24, "Q", 10
"CoffeS1Q48", 48, "Q", 11
"CoffeS1Q72", 72, "Q", 12
"CoffeS2C24", 24, "C", 7
"CoffeS2C48", 48, "C", 8
"CoffeS2C72", 72, "C", 9
"CoffeS2Q24", 24, "Q", 10
"CoffeS2Q48", 48, "Q", 11
"CoffeS2Q72", 72, "Q", 12

I want to test for deferentially expressed genes between the two "exp" conditions ('C' and 'Q'). Each sample has a technical replicate ("rep" column) and the experimental conditions were assessed at multiple times ("hpi" column).

From what I have gathered from other questions, the best way of dealing with technical replicates would be average the expression measurements across replicates, is that correct?

And since each sample also contains expression results at 24, 48 and 72h, should this be considered a potential confounding factor when testing between the two experimental conditions, and therefore specified in the adjustvars option? Or is this a major confusion?

 Thank you very much for your time and attention.

Cheers,

Diogo

ballgown rnaseq differential gene expression • 2.0k views
ADD COMMENT
0
Entering edit mode
Alyssa Frazee ▴ 210
@alyssa-frazee-6710
Last seen 3.9 years ago
San Francisco, CA, USA

It seems like what you are looking to do here is determine the differential expression between "C" and "Q", adjusting for everything else. I would convert your phenodata file to something like:

"ids", "hpi", "exp", "rep", "sample"
"CoffeR1C24", 24, "C", 1, "CoffeR1"
"CoffeR1C48", 48, "C", 2, "CoffeR1",
"CoffeR1C72", 72, "C", 3, "CoffeR1",
"CoffeR1Q24", 24, "Q", 4, "CoffeR1",
"CoffeR1Q48", 48, "Q", 5, "CoffeR1",
"CoffeR1Q72", 72, "Q", 6, "CoffeR1",
"CoffeR2C24", 24, "C", 1, "CoffeR2",
"CoffeR2C48", 48, "C", 2, "CoffeR2",

"CoffeR2C72", 72, "C", 3, "CoffeR2",
"CoffeR2Q24", 24, "Q", 4, "CoffeR2",
"CoffeR2Q48", 48, "Q", 5, "CoffeR2",
"CoffeR2Q72", 72, "Q", 6, "CoffeR2",
"CoffeS1C24", 24, "C", 7, "CoffeS1",
"CoffeS1C48", 48, "C", 8, "CoffeS1",
"CoffeS1C72", 72, "C", 9, "CoffeS1",
"CoffeS1Q24", 24, "Q", 10, "CoffeS1",
"CoffeS1Q48", 48, "Q", 11, "CoffeS1",
"CoffeS1Q72", 72, "Q", 12, "CoffeS1",
"CoffeS2C24", 24, "C", 7, "CoffeS2",
"CoffeS2C48", 48, "C", 8, "CoffeS2",
"CoffeS2C72", 72, "C", 9, "CoffeS2",
"CoffeS2Q24", 24, "Q", 10, "CoffeS2",
"CoffeS2Q48", 48, "Q", 11, "CoffeS2",
"CoffeS2Q72", 72, "Q", 12, "CoffeS2"

Then I would do the test like this:

results = stattest(bg, covariate="exp", adjustvars="sample")

This will give you the average change between C & Q for the same sample (where the average is across time points and tech reps). 

Hope this is helpful in getting started with your analysis!

ADD COMMENT

Login before adding your answer.

Traffic: 1056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6