I am working on RNA-seq data analysis, I have 2 known batches (two different days for RNA sequencing) for my analysis. I run the ComBat function to remove the batch effect on normalized dds from DESeq2 or unnormalized dds. Now, I am unable to do create the DESeqDataSet from combatedata for further analysis. When I am using DESeqDataSetFromMatrix for creating DESeq2 data object by using combatedata, I am getting error: Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative
Thank you so much Kevin for your quick response. I had already used the limma::removeBatchEffect() for removing batch effect in two groups.
I had used this script:
Adjusting for batch effects with VST
vsd <- vst(dds)
plotPCA(vsd, "Batch")
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$Batch)
plotPCA(vsd, "Batch")
Now, in PCA plot I am able to see the batch correction in my two batches. Can you suggest this is the right strategy?
Hey, that seems fine - do not worry. Just to check: you are no longer using ComBat anywhere in your workflow?
Thank you so much. Right now I am not using the ComBat function for my analysis. I am also using the sva package for correcting the batch. The script I used is:
in order to use SVA to remove any effect on the counts from our surrogate variables
ddssva <- dds ddssva$SV1 <- svseq$sv[,1] ddssva$SV2 <- svseq$sv[,2] design(ddssva) <- ~ SV1 + SV2 + Batch + outcome_txt
surrogate variables by running DESeq with the new design
ddssva <- DESeq(ddssva)
to see results
resultsNames(ddssva)
Can you give your suggestions regarding this??
Do you suspect that there are surrogate variables in your data, though?; how much variance is explained by
SV1
andSV2
; is modelingbatch
alone not sufficient (and then removing the effect vialimma::removeBatchEffect()
)?Hello Kevin,
After adjusting the batch effect by using VST. I am not getting differential gene expression.
vsd <- vst(dds)
plotPCA(vsd, "Batch")
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$Batch)
plotPCA(vsd, "Batch")
after this script. How I will get differential gene expression? Again I have to run DESeq2 or not? If, yes then how I will create the dds object by using batch corrected matrix?
You should have conducted a differential expression analysis before you run those commands that you have pasted. You do not have to re-create the dds object with the batch-corrected matrix.
To recap:
batch
in your design formula. Your design formula may look like, for example,~ condition + batch
condition
) - the effect of batch will be 'adjusted' when the statistical inferences are made due to the fact thatbatch
is included in the design formula)removeBatchEffect()
) if you want to use your data matrix downstream for other purposes.Thank you so much Kevin.
Hello, Whenever I am using the SVA to remove the batch effect, I am getting a higher number of the differential genes as compared to without the removal of batch effects. By using this script on same data:
Do you think this approach is right for getting my differential gene expression?
Can you please reformat your code to make it more presentable? Edit your post, highlight the code, and then click on the
101 010
button. Also, be sure that you have searched for the answer to your question before re-posting here. Thank you!Thank you for editing. What evidence have you that there are 2 extra statistically significant surrogate variables in your data? You would not want to 'over-adjust' when making statistical inferences, which is what you may be doing.
To re-capitulate my point: doing the following should be sufficient to account for
batch
:By including extra surrogate variables without major justification, you may actually be eliminating the very effect that you want to measure.
Thank you so much for your response Kevin, OK, then this is not the right way to do the same. Whenever I am using this script and I am getting a higher number of differential gene expression and log2Foldchange is quite high as compare to without sva.
The following script is the right way or not??
Thanks, Raj
...but how are you going to perform a differential expression analysis to test for
outcome_txt
with a formula like this:~ SV1 + SV2 + Batch
? You have to include, in your design formula, the condition / trait, the sub-groups of which, you want to test for differential expression.At this point, I can only ask that you read over my original answer and all of my subsequent comments. It is apparent that you are not fully understanding the process of batch adjustment / correction - you are not even answering the questions that I am putting back to you.
Yes, I am not able to fully understand the process of batch corrections. I don't have so much experience regarding this, I just started working on RNA seq data. I am trying to analyze my RNA -seq data in which two known batches are there for two dates of experiments.
In which, my condition is outcome_txt, whenever I am using SVA, I am getting a high number of differential expression of genes between my groups.
I am just asking, whether by using sva in my analysis, I am doing right analysis or not?? my script for analysis condition:outcome_txt, batches:Batch(1&2) is:
Oh, I know, but, my question to you regarding the inclusion of the surrogate variables (
SV1
andSV2
) was this: what evidence do you have that justifies the inclusion of these surrogate variables? You already have yourbatch
variable included.My worry was, that, the unnecessary inclusion of variables in the design formula can result in an 'over-adjustment' when deriving the statistical inferences for your main factor / condition of interest, i.e.,
outcome_txt
.Over the years, I have seen many researchers becoming somewhat 'obsessed' about batch-correction and adjusting for unknown artifacts in their data (I was like this, too).
The best approach to mitigate these extraneous effects is, of course, a good experimental design, something which appears to be lacking in many studies.
Hello, Can you suggest the right approach for this kind of analysis? Experimental plan:
Two groups in which I want differential gene expression: outcome_txt
Two batches for the two different experimental dates: Batch
It will be very helpful for me If you provide an exact pipeline for the analysis of RNA seq data.
Thanks Rajesh