Question

MA Plot colors

0

Entering edit mode

pablo_calza • 0

@pablo_calza-7616

Last seen 10.0 years ago

Argentina

Hi,

I am trying to draw a MA Plot with my RNAseq data. Thus, I wrote the following pipeline in RStudio.

>library(cummeRbund)
>cuff_data <- readCufflinks('diff_out')
>MAplot(genes(cuff_data),'LC','LT',logMode=T,pseudocount=1,smooth=F)

The pipeline works well.The problem is I am trying to draw dashed lines in my graph showing the -2 and 2 Log cutoff, and the significant genes in red points instead of black ones.

Does anyone knows how to do these?

Thanks

Pablo

MA Plot • 6.3k views

ADD COMMENT • link updated 10.0 years ago by James W. MacDonald 68k • written 10.0 years ago by pablo_calza • 0

score 1 · Answer 1 · 2015-04-17

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 52 minutes ago

United States

MAplot() uses ggplot2 to do the plotting, so it's not as direct to deal with as base graphics would be. You first need to capture the output.

p <- MAplot(genes(cuff_data),'LC','LT',logMode=T,pseudocount=1,smooth=F)

I think there is an iron clad law that all ggplot objects are named 'p', so consider yourself warned...

You then tell ggplot what else you want to do, thusly

p <- p + geom_point(aes(A, log2(M), colour = factor(ifelse(log2(M) < 2, 1,2))), size = 0.8) + geom_hline(yintercept = c(-2,2)) + theme(legend.position = "none") + scale_colour_manual(values = c("black","red"))

The geom_point business just adds the color specification, the geom_hline just adds the horizontal lines, theme just tells ggplot not to make a legend, and the scale_colour_manual sets the colors to black and red, instead of the defaults of salmon and aqua. See the ggplot help for more information.

Edit: I forgot - you won't get a plot until you type 'p' at the R prompt

ADD COMMENT • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks a lot James! That works really well!

But I´ve got two more questions:

a- The script you wrote only paint in red the points log2(M)>2. Is there a way of making the ones with log2(M)<-2 also red? I´m trying to solve the problem but I have not solve it yet.

b- Is there a way of drawing the p<0.01 genes (points) in red, despite their log2(M)?

Thank you in advanced!

Pablo

ADD REPLY • link 10.0 years ago pablo_calza • 0

1

Entering edit mode

The first part of the command is this:

geom_point(aes(A, log2(M), colour = factor(ifelse(log2(M) < 2, 1,2))), size = 0.8)

And the part that sets the color is

colour = factor(ifelse(log2(M) < 2, 1,2))

So I obviously missed a step, because you want the absolute value of log2(M). Sometimes it is hard to find the function you want, so in this case you should use apropos(). Also note that it is better to use shorter arguments rather than longer for apropos():

> apropos("absolute")
character(0) ## no joy!
> apropos("abs")
[1] "abs"                      "normalizeMedianAbsValues"
[3] "xtabs"

And I'll let you take it from there.

As for question (b), the answer is always yes. This is R after all. Unless of course you mean 'is there an argument that I can use to make that happen', in which case the answer appears to be no. However, as I said, this is R so you can do pretty much whatever you want, if you are willing to work for it.

The problem here is that MAplot() doesn't compute a p-value, so you need to supply one. Also, MAplot() just returns a 'gg' object, so we have to get all hackalicious. I'll use example(MAplot) for this next step.

>example(MAplot)

This creates data that I can now use, and also creates a 'gg' object (called 'd') that we are going to hack on. MAplot() doesn't output the M and A values directly - they are in the gg object, which is just a fancy list.

> class(d)
[1] "gg"     "ggplot"
> names(d)
[1] "data"        "layers"      "scales"      "mapping"     "theme"      
[6] "coordinates" "facet"       "plot_env"    "labels"     
> head(d$data)
      gene_id         A         M
1 XLOC_000001 0.6818857 0.1049131
2 XLOC_000002 0.0000000       NaN
3 XLOC_000003 0.0000000       NaN
4 XLOC_000004 4.9984325 1.4068705
5 XLOC_000005 2.2671843 1.6819516
6 XLOC_000006 0.0000000       NaN

We don't really want to jam all sorts of things into d, but instead want to use the functions in ggplot2 to do it for us. So we extract the data object from d, and modify:

> z <- d$data
> z$p <- runif(nrow(z))

Note that the second step I am doing is not what you want to do. You instead want to put the p-values in z, in the correct order. I assume you have a data.frame floating about that has the transcript IDs as well as p-values? If so, let's say you called it d.f. I further assume this data.frame has a column called 'gene_id' that has the gene IDs, and a column called 'p.value' that has the p-values. This is probably not true, so you need to modify to suit. Then you want to do something like

EDIT - did this backwards the first time

z$p <- d.f[match(d.f$gene_id, z$gene_id),"p.value"]

You should look at ?match to figure out what I am doing there. Now you can plot the MAplot.

> d + geom_point(data = z, aes(A, log2(M), colour = factor(ifelse(p > 0.05, 1, 2)))) + theme(legend.position = "none") + scale_colour_manual(values = c("black","red"))

ADD REPLY • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

Note that I edited the last comment.

ADD REPLY • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you very much! That works well until the last comment. Sorry about the silly question, but I really do not know what do you mean about the data.frame. I understand that is the file where my data is, however I do not know which is in particular. This file was created when I wrote:

a<-readCufflinks(system.file("extdata", package="cummeRbund")) #Create CuffSet object from sample data
genes<-a@genes #Create CuffData object for all 'genes'
d<-MAplot(genes,'hESC','Fibroblasts')

(?)

ADD REPLY • link 10.0 years ago pablo_calza • 0

0

Entering edit mode

I can't help you any further. I don't use cufflinks, nor cummeRbund, as none of my collaborators use either enough replicates nor sufficient depth for something like that to make sense.

If you want to color things using p-values then you will have to figure out how you generate p-values using cummeRbund (there is a vignette that may help), and then you will have to add the p-values to the data as I showed above.

ADD REPLY • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

THANKS James! I will try to do it!

CheerS!

ADD REPLY • link 10.0 years ago pablo_calza • 0

0

Entering edit mode

Sorry I am writting again, but I´ve got a question about your script. I add a column of my p-values to the z data. I could make it. However, when I write

z$p <- runif(nrow(z))

The pvalues number changes and have nothing to do with the real ones. So, when I draw the graph with

> d + geom_point(data = z, aes(A, log2(M), colour = factor(ifelse(p > 0.05, 1, 2)))) + theme(legend.position = "none") + scale_colour_manual(values = c("black","red"))

The red points did not correspond with the gene_id with p_value<0.05.

However, when I try to draw the plot without writing

z$p <- runif(nrow(z))

Nothings gets red. Do you know why or how can I solve this?

Thanks in advanced

ADD REPLY • link 10.0 years ago pablo_calza • 0

0

Entering edit mode

Yes. It's because you aren't paying close enough attention.

Here is the relevant part from my post:

> z <- d$data
> z$p <- runif(nrow(z))

Note that the second step I am doing is not what you want to do. You instead want to put the p-values in z, in the correct order.

ADD REPLY • link 10.0 years ago James W. MacDonald 68k

0

Entering edit mode

The thing is I already put the p-values in z, in the correct order. However when I write:

> d + geom_point(data = z, aes(A, log2(M), colour = factor(ifelse(p > 0.05, 1, 2)))) + theme(legend.position = "none") + scale_colour_manual(values = c("black","red"))

No point gets red. I guess it is something related with colour = factor(ifelse(p > 0.05, 1, 2)); but I can´t realize what it is.

The z data is the following table:

gene_id A M p
1   XLOC_000001   0.3925539832   0.99284785   0.8965
2   XLOC_000002   0.0657089848   6.32477626   1
3   XLOC_000003   1.4985551631   1.19541239   0.0448
4   XLOC_000004   0.0000000000   NaN   1
5   XLOC_000005   0.6504746723   1.06764279   0.7078
6   XLOC_000006   0.0581513680   0.66861219   1
7   XLOC_000007   0.4789051940   0.68426498   0.00235

Thanks in advanced again!

ADD REPLY • link 10.0 years ago pablo_calza • 0

0

Entering edit mode

Are the p-values you add to z numeric or factor?

ADD REPLY • link 10.0 years ago James W. MacDonald 68k