DESeq and SVA for batch effect
1
0
Entering edit mode
@7e6a2774
Last seen 7 weeks ago
Italy

Hello, I'm doing some bulk RNA-Seq analysis to identify different genes in two condition (Braccio) but before that I want to identify some not known variable and delete the batch effect (RIN) of my variable. So I'm moving in this way:

dds_campione <- DESeqDataSetFromMatrix(countData=raw, 
                                   colData=condition_breakfast, 
                                   design=~ 1)

smallestGroupSize <- 3
keep <- rowSums(counts(dds_campione) >= 10) >= smallestGroupSize
dds_campione <- dds_campione[keep,]

dds_campione <- estimateSizeFactors(dds_campione)
matriceFiltrata_campione <- counts(dds_campione,normalized=TRUE)

mm_campione <- model.matrix(~ Braccio, colData(dds_campione))
mm0_campione <- model.matrix(~1, colData(dds_campione))
fit_campione <- svaseq(matriceFiltrata_campione, mod=mm_campione, mod0=mm0_campione)

dds_campione$SV1 <- fit_campione$sv[,1]
dds_campione$SV2 <- fit_campione$sv[,2]
dds_campione$SV3 <- fit_campione$sv[,3]
dds_campione$SV4 <- fit_campione$sv[,4]
dds_campione$SV5 <- fit_campione$sv[,5]
dds_campione$SV6 <- fit_campione$sv[,6]
dds_campione$SV7 <- fit_campione$sv[,7]
dds_campione$SV8 <- fit_campione$sv[,8]

design(dds_campione) <- ~  SV1 + SV2 + SV3 + SV4 + SV5 + SV6 + SV7 + SV8 + Braccio 
vsd_campione <- vst(dds_campione, blind=FALSE)

assay(vsd_campione) <- limma::removeBatchEffect(assay(vsd_campione), 
batch = condition_breakfast$RIN, design=mm_campione)

plotPCA(vsd_campione,intgroup=c( "Braccio"))

dds_campione <- DESeq(dds_campione)
res_campione <- results(dds_campione)

data_campione <- as.data.frame(res_campione)
data_campione <- tibble::rownames_to_column(data_campione, "geneID")

My question is:

When I look at the pca, my variance is high so removing the batch effect and using the sva seems to work but than, when I go one with the analysis in DESeq2, I do not see so much variabilites and that's because in "dds_campione <- DESeq(dds_campione)" I do not have the matix adjusted but the original one, how i can do? Because this matrix is with integers so I can not use it in the command. How I can use the adjusted matrix that I plot with the pca for the analysis?

I also try to insert the batch in the design

dds_campione <- DESeqDataSetFromMatrix(countData=raw, 
                                       colData=condition_breakfast, 
                                       design=~ RIN + Braccio)

But nothing really changes

Thank you!

DESeq2 BatchEffect sva • 892 views
ADD COMMENT
0
Entering edit mode

You're not using the surrogate variables at all in your batch effect removal. Since it's continuous values you would need to provide it to the covariates argument in removeBatchEffect.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 7 days ago
United States

The reason you are estimating surrogate variables is to include them as variables in your design matrix, which you do not appear to be doing. In addition, if you do not include RIN in your reduced model for svaseq, you are almost surely capturing any variability associated with RIN in your surrogate variables, in which case there is no need to include that as a coefficient in your model.

0
Entering edit mode

Agree with James, I don't like to include RIN in design, but prefer to let SVA / RUV find these technical variables. If RIN is low for some samples, you will certainly find an SVA / RUV factor that correlates with RIN, but the factors are preferred for modeling.

ADD REPLY
0
Entering edit mode

So you think this should me my command?

dds_campione <- DESeqDataSetFromMatrix(countData=raw, 
                                           colData=condition_breakfast, 
                                           design=~ Braccio)

smallestGroupSize <- 3
keep <- rowSums(counts(dds_campione) >= 10) >= smallestGroupSize
dds_campione <- dds_campione[keep,]

dds_campione <- estimateSizeFactors(dds_campione)
matriceFiltrata_campione <- counts(dds_campione,normalized=TRUE)


mm_campione <- model.matrix(~ Braccio **+RIN** , colData(dds_campione))
mm0_campione <- model.matrix(~1, colData(dds_campione))
fit_campione <- svaseq(matriceFiltrata_campione, mod=mm_campione, mod0=mm0_campione)

dds_campione$SV1 <- fit_campione$sv[,1]
dds_campione$SV2 <- fit_campione$sv[,2]
dds_campione$SV3 <- fit_campione$sv[,3]
dds_campione$SV4 <- fit_campione$sv[,4]
dds_campione$SV5 <- fit_campione$sv[,5]
dds_campione$SV6 <- fit_campione$sv[,6]
dds_campione$SV7 <- fit_campione$sv[,7]
dds_campione$SV8 <- fit_campione$sv[,8]

vsd_campione <- vst(dds_campione, blind=FALSE)
covariates_campione <- colData(vsd_campione)[,c("SV1",'SV2',"SV3","SV4", "SV5" , "SV6" , "SV7" , "SV8")]

assay(vsd_campione) <- limma::removeBatchEffect(assay(vsd_campione), batch = condition_breakfast$RIN,
                                               design=mm_campione)

design(dds_campione) <- ~  **RIN + Braccio + SV1+ SV2+ SV3+ SV4 + SV5+ SV6+ SV7+ SV8**
dds_campione <- DESeq(dds_campione)
ADD REPLY
0
Entering edit mode

No, you should never use removeBatchEffect to adjust your count data. And both Mike and I recommended using surrogate variables instead of RIN values. And IIRC, the default for DESeq is to use the last coefficient of your design matrix, so you should probably do

design(dds_campione) <- ~SV1+ SV2+ SV3+ SV4 + SV5+ SV6+ SV7+ SV8 + Braccio

so that

results(DESeq(dds_campione))

returns the 'Braccio' coefficient.

ADD REPLY
0
Entering edit mode

I do not understand why I should not use removeBatchEffect for the RIN. I understand that can be a good idea put it in the model befor the sva but why I can not use it as batch effect for the visalization?

mm_campione <- model.matrix(~ Braccio +RIN , colData(dds_campione))

and than

 assay(vsd_campione) <- limma::removeBatchEffect(assay(vsd_campione), batch = condition_breakfast$RIN, 
                                                    covariates = covariates_campione,design=mm_campione)

but for the design only the covariate found with the RIN in the model:

design(dds_campione) <- ~  Braccio + SV1+ SV2+ SV3+ SV4 + SV5

am I understanding right?

ADD REPLY
0
Entering edit mode

Oh, right. If you are just using the batch corrected data for visualization it's fine.

ADD REPLY
0
Entering edit mode

Ok, thank you so much!!

ADD REPLY

Login before adding your answer.

Traffic: 620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6