Hi all,
I am trying to use the EnhancedVolcano
package to plot the results of my DEGs
from DESeq2
. I have been reading this guide but when using EnhancedVolcano
function, the plot won't show. The following is the code I am using:
myCondition <- read.csv("C:/Users/Luigi/Desktop/WUR/R_Directory/Excel_v2/myCondition/myCondition_CTRL_XS.csv", header = TRUE, sep = ";")
View(myCondition)
HitCount <- read.csv("C:/Users/Luigi/Desktop/WUR/R_Directory/Excel_v2/HitCounts/HitCounts_30mpi_CTRL_XS.csv", header = TRUE, sep = ";")
View(HitCount)
rownames(HitCount) <- HitCount[,1]
HitCount <- HitCount[,-1]
head(HitCount)
all(colnames(HitCount) == myCondition$SampleID)
str(HitCount)
dds <- DESeqDataSetFromMatrix(countData = HitCount,
colData = myCondition,
design = ~ Condition)
dds <- DESeq(dds)
res <- results(dds)
res.df <- as.data.frame(res)
EnhancedVolcano(res.df, lab = rownames(res.df),
x = "log2FoldChange",
y = "pvalue")
And this is an extract from my res.df:
baseMean log2FoldChange lfcSE stat pvalue padj
VDAG_00001 549.046166 -0.0092977503 0.12107151 -0.076795523 9.387862e-01 9.545491e-01
VDAG_00002 1625.449627 0.1763897241 0.08555426 2.061729317 3.923351e-02 6.284944e-02
VDAG_00003 1188.308547 1.8559310919 0.09512573 19.510295447 8.976185e-85 2.053759e-83
As you can see, in the first column is missing the header Gene_ID
. I have tried to retrive the gene symbol using this code, but apparently VDAGs aren't either "ENTREZ"
or "ENSEMBL"
IDs. I did use AnnotationDd
and made my own database, org.Vdahliae.eg.db
.
Do you have any suggestions on how to modify the code and plot the Volcano plot?
Thanks
Hi James,
Thanks for your reply. Actually, I have tried again to run the Volcano Plot. This time I first saved the results from the
dds
then run alone the code you have posted and I get a weird result (it is my first time runningEnhancedVolcano
).This is the code I have used:
Did I made an error in the code or is it just that no DEGs are in my dataset?
Moreover, in the previous code I have posted I think I might have a problem with:
Which removed the header from the first column and thus having problems in plotting the Volcano Plot.
I can't diagnose much, because I don't have your data. But there are things you can do!
When you get weird plotting results it's often most useful to check the data yourself, as nobody else can do that for you.
Thanks again. I have run:
and I get this as result:
To be honest, I do not know how to interpret this. The only thing that pop-ups to my eye is NA's 4790. That's very weird because before saving my results I have run
na.omit()
. Also, if I just sort my file, the there are no NA's but just a hundred of values equal to 0, or with a p-value very low. And, when I run the Volcano plot, I get this warning message:It's not uncommon for
DESeq2
to produceNA
p-values. Also there is almost no reason to ever 'save your results' as part of an analysis, if by that you mean that you run a bunch of code and then output the results in a CSV file or whatever that you plan on reading back in at a later date.There are reasons to save intermediate R objects if they are computationally expensive to produce, but that is almost never the case with something small like a bulk RNA-Seq analysis. If I have some long-running process that will make re-running some code painfully slow, I will save the results of that step to speed things up. But in general I don't save anything but the code, which can be re-run at any time (and which will be re-run many times in the course of doing the analysis).
And if you are saving intermediate data, you should be using either Rdata or RDS files rather than converting to some non-R format, where you might end up with unexpected conversion issues (like converting a bunch of really small p-values to zeros, as it appears happened here).
You should also not have a maximum p-value of 0.032, but instead it should be 1 or very close to it. That looks to me like you are filtering the results, which is not what you want to be doing. Getting the volcano plot should be a very simple thing:
Three lines of code and you are there. Unless you have the expertise, you shouldn't (and don't need to) do anything more than this.
I have tried to make the Volcano plot also after performing the
dds
analysis but the plot is empty. I do believe it is because I had to use this piece of code:If I do not use it, the dds gives me an error and the analysis stops. This is because instead of reading the value in the second column, it read the value (
Gene_IDs
) in the first column. I do not know if there is a way of fixing it without modifying the basicdds
.Regarding the filtering you mentioned, in the first post I wrote the whole code I used. I did not use filtering in my code, as I just followed the
DESeq2
vignette and some videos on internet.Hi James, I do not know if it might be of interest for you, but in the end I could plot the Volcano plot. I did have to use the
lfcShrink()
. I guess in this way the fold change differences fitted on the x axis.Also, in doing so, my
summary(vol.df$pvalue)
have a max of 0.992