Volcano plot labeling troubles
4
0
Entering edit mode
@mccallionc403-6871
Last seen 10.1 years ago
United States

Hello,

I am trying to make a volcano plot, but of Log Fold Change vs FDR adjusted p-values.  So far, I've managed to make a decent plot, but I'm having trouble figuring out how to label the genes that are differentially expressed, wether as the top, say 30, genes, or all of the genes above a certain threshold, and identify them on the graph.  My progress so far goes thusly (Also, if anyone knows how to get limma to plot a graph with these axis instead of log fold change vs log odds, that would be greatly appreciated as well):

source ("http://bioconductor.org/biocLite.R")#f o r
#downloading biocLite
biocLite()#for installing bioconductor packages
biocLite("BiocUpgrade")#use latest version
biocLite("limma")#linear fits
biocLite("GEOquery")#downloading data sets
biocLite("statmod")#
biocLite("affyPLM")#normalizing ExpressionSets
#begin loading libraries
library("limma")
library("GEOquery")
library("statmod")
library("affyPLM")
#libraries loaded
gse44563 <- getGEO("GSE44563", GSEMatrix = TRUE)
#downloading series record
#44563 with any attached data
eset44563 = gse44563[[1]]#extract attached eSet (data)
y <- log2(exprs(eset44563))
design <- model.matrix(~source_name_ch1,data=pData(eset44563))
preFit <- lmFit(y, design)
eset <- normalize.ExpressionSet.quantiles(eset44563, transfn="none" )
#quantile normalization of eset44563
#8
y <- log2(exprs(eset))#log2 transformation of eset
design <- model.matrix(~source_name_ch1,data=pData(eset44563))#making
#design matrix
#describes what array looked like
#allows for a linear fit , using matrix algebra
fit <- lmFit(y, design)#linear fit
plotMA(fit)
IsExpressed <- fit$Amean>3#is mean expression
#of each gene above 3?
efit <- eBayes(fit[IsExpressed,], trend=TRUE, robust=TRUE)#Empi r ical
#Bayes adjustment
plotMA(efit)
write.csv((topTable(efit, adjust = "fdr", n = 54675)), "possible5.csv")
#write table of genes,
#sorted according to False Discovery Rate adjusted
#p-value to file
tableTop <- topTable(efit, adjust = "fdr", n = 42471)
plot(tableTop$logFC, -log10(tableTop$adj.P.Val), xlim = c(-1, 2), ylim = c(0, 6), xlab = "log2 fold change", ylab = "FDR adjusted p-value")

 

My issue right now is trying to figure out how to label the points that are meaningfully foldchanged, for now, without using limma, since it appears that limma only uses logodds as its measure of statistical significance.

Any help, or even just pointers to pre-existing explanations would be immensly appreciated.  Thank you

Also, here's my SessionInfo():

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] splines   parallel  stats     graphics 
[5] grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] BiocInstaller_1.16.0  affyPLM_1.40.1       
 [3] preprocessCore_1.26.1 gcrma_2.36.0         
 [5] statmod_1.4.20        GEOquery_2.30.1      
 [7] limma_3.20.9          affy_1.42.3          
 [9] Biobase_2.24.0        BiocGenerics_0.10.0  

loaded via a namespace (and not attached):
[1] affyio_1.32.0     Biostrings_2.32.1
[3] IRanges_1.22.10   RCurl_1.95-4.3   
[5] stats4_3.1.1      tools_3.1.1      
[7] XML_3.98-1.1      XVector_0.4.0    
[9] zlibbioc_1.10.0
limma volcanoplot • 7.5k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

I understand why you might be tempted to plot logFC vs FDR but, in my opinion, this isn't the right way to make a volcano plot.

First, if a volcano plot is made with p-values, then it should use raw p-values rather than FDR adjusted values. This is because FDR adjusting loses some information (e.g., distinct p-values may be mapped to the same FDR value).

Second, there is a reason why limma plots log-odds instead of log-p-values. The log-odds are exactly equivalent to p-values for your analysis (same ordering) but on a scale that is arguably more meaningful than log-p. The log-odds also has the effect of global adjustment, like FDR, but without losing information.

Instead of doing your own coding, you might consider using the plot that limma provides

Later edit

By popular demand, volcanoplot() now offers either p-values or log-odds as options for the y-axis and the default has been changed to p-values.

ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 hours ago
United States
ind <- tableTop$adj.P.Val < 0.05
points(tableTop$logFC[ind], -log10(tableTop$adj.P.Val)[ind], pch = 19, col = "red")
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

The quantile normalization step you are doing is not necessary. This data has already been RMA processed, according to the GEO webpage, meaning that it can already been quantile normalized at the probe level. By re-normalizing at the probe-set level, you are replacing the normalization already done with something not quite as good.

ADD COMMENT
0
Entering edit mode
@mccallionc403-6871
Last seen 10.1 years ago
United States

Thank you so much

ADD COMMENT

Login before adding your answer.

Traffic: 584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6