Finding the correct covariate for RNAseq experiment
1
0
Entering edit mode
mat.lesche ▴ 110
@matlesche-6835
Last seen 7 months ago
Germany

Hey,

I'm running an experiment with two conditions, one is the wildtype and the other is a knock out of a gene. Each sample is from a single mouse and the samples were isolated on different days. Here is an overview

Sample GT Sex Date
ss11 KO male may
ss12 KO male may
ss13 WT male may
ss14 WT male may
ss15 KO female june
ss16 WT female june
ss17 KO female june
ss18 WT male june

I ran several PCAs because the initially the samples did not cluster ( initial PCA)

Next, I checked for Sex and Date and Sex + Date as covariates and ran a PCA as well. I used removeBatchEffect from Limma and the transformed counts from DESeq2 and the PCA looked like this:

Correction for Sex,Correction for Date,Correction for Sex and Date

As one can see, if I correct for Sex PC2 is showing the difference in my condition of interest but the correction for date or date+sex brings this to the PC1.

I also ran sva with the following model:

mod  <- model.matrix(~ GT, colData(ddsrun))
mod0 <- model.matrix(~   1, colData(ddsrun))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)

and the result is

  1 2
ss11 -0.30120343 0.16577893
ss12 -0.31295115 -0.05975287
ss13 -0.24305134 -0.05975287
ss14 -0.43343732 0.20823567
ss15 0.54045732 -0.28756674
ss16 0.29909472 0.23194841
ss17 0.42691938 0.50661027
ss18 0.02417181 -0.72876957

As a side question, the first column cleary show the effect for the isolation date but the second column doesn't correlate with Sex or Date. So I would not use this. Is there any good way how to interpret this? Right now, I would only try to overlap it with already know and likely covariate and not use it! Otherwise I introduce a batch which I don't know what it means???

With this, I would think that the sex doesn't have such a big impact on the data, but the date has. That means I will use  ~ Date +  GT as design formula for DESeq. Additionaly, I did an mds plot of the euclidean distance and it suggest that the data has an higher impact on the data compared to sex too.

Now I was wondering if these steps are in the correct order and my conclusion is correct? I have another experiment with the same set-up but here the GT effect is less and overlayed by Data and/or Sex

Thanks

Mathias

deseq2 removebatcheffect() sva differential gene expression • 2.1k views
ADD COMMENT
0
Entering edit mode

Is there a question?

ADD REPLY
0
Entering edit mode

Hi James, now there is. Sorry but the Submit was to early. There used to be a preview button. I realised that submit is creating the thread before it was too late.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi,

I usually recommend people to put into the model those terms that they think might affect gene expression, even if only for some genes, and so long as they have an experimental design that allows it. You have nearly perfect confounding of date and sex, so you can pick the one that has a bigger effect in the PCA plots and put that one in. You can't do much more than that due to the confounding. So I'd also use the design you have suggested, ~date + genotype.

ADD COMMENT
0
Entering edit mode

Thanks Michael.

ADD REPLY

Login before adding your answer.

Traffic: 492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6