SmearPlot edgeR - adding gene names to genes of interest

0

Entering edit mode

Christine Gläßer ▴ 60

@christine-glaer-6291

Last seen 10.3 years ago

Dear all, I have used edgeR for analyzing differential expression. I also created a smear plot visualizing DGE results (code see below). I'd now like to highlight some genes of interest (they are significantly diff. expressed) in this plot by labeling them; do you know if there's a way to do so? Kind regards, and many thanks in advance, Christine Gl??er ###### Code ######## counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE)) cpms<-cpm(counts) keep <- rowSums(cpms>1)>=4 ### at least 4 replicates counts <- counts[keep,] group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) d <- DGEList(counts=counts, group=group) d <- calcNormFactors(d) design <- model.matrix(~ 0+group) colnames(design) <- cols contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, levels=design) y <- estimateGLMCommonDisp(d,design) y <- estimateGLMTrendedDisp(y,design) y <- estimateGLMTagwiseDisp(y,design) fit <- glmFit(y,design) for (i in colnames(contrasts)) { print (i) outfilename <- paste(paste(outdir, i, sep="/")) outfilename <- paste(paste(outfilename, ".txt", sep="")) outfilesmearplot <- paste(paste(outdir, i, sep="/")) outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", sep="")) lrt <- glmLRT(fit, contrast=contrasts[,i]) tt <- topTags(lrt, n=nrow(d), adjust.method="BH") rn <- rownames(tt$table) deg <- rn[tt$table$FDR < 0.05] pdf(outfilesmearplot) ###### SMEAR PLOT STARTS HERE ####### plotSmear(d, de.tags=deg) abline(h = c(-1, 1), col = "dodgerblue") dev.off() write.table(tt$table, file=outfilename, sep="\t") }

edgeR edgeR • 4.8k views

ADD COMMENT • link updated 11.0 years ago by Aliaksei Holik ▴ 350 • written 11.0 years ago by Christine Gläßer ▴ 60

0

Entering edit mode

Aliaksei Holik ▴ 350

@aliaksei-holik-4992

Last seen 8.9 years ago

Spain/Barcelona/Centre for Genomic Regu…

Hi Christine, There's probably a more elegant way, but that's how I do it. You should be able to use R's 'text' function by taking logCPM values of the genes you're interested in as 'x' argument, logFC values as 'y' argument, and the gene names or ID's as 'labels' argument. Say top.x is your resulting "TopTags" object and it contains dataframe called 'table', which contains among others the following columns: Symbols - logFC - logCPM You should be able to subset it to the genes you're interested in by: ids <- c("...", "...") gene.labels <- x$table[x$table$Symbols %in% ids,] Where 'ids' is a vector of gene symbols for genes of interest, but you can use Entrez Gene IDs or any other thing you can think of, for instance all genes with logFC > 1 etc. After you've plotted your smear plot from DGELRT or DGEExact object you can mark the genes of interest with the following: text(x=gene.labels$logCPM, y=gene.labels$logFC, labels=gene.labels$Symbols, cex=0.7, pos=1) Note that this will not work if you're plotting the SmearPlot from DGEList as your TopTags object will contain squeezed logCPM values different from those in DGEList. Hope it helps, Aliaksei. On 16/12/13 7:07 PM, Christine Gl??er wrote: > Dear all, > > I have used edgeR for analyzing differential expression. I also created > a smear plot visualizing DGE results (code see below). I'd now like to > highlight some genes of interest (they are significantly diff. > expressed) in this plot by labeling them; do you know if there's a way > to do so? > > Kind regards, and many thanks in advance, > > Christine Gl??er > > > ###### Code ######## > > counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", > row.names = 1, as.is = TRUE)) > cpms<-cpm(counts) > keep <- rowSums(cpms>1)>=4 ### at least 4 replicates > counts <- counts[keep,] > > group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) > > d <- DGEList(counts=counts, group=group) > d <- calcNormFactors(d) > > design <- model.matrix(~ 0+group) > colnames(design) <- cols > > contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, > levels=design) > > y <- estimateGLMCommonDisp(d,design) > y <- estimateGLMTrendedDisp(y,design) > y <- estimateGLMTagwiseDisp(y,design) > fit <- glmFit(y,design) > > for (i in colnames(contrasts)) > { > print (i) > > outfilename <- paste(paste(outdir, i, sep="/")) > outfilename <- paste(paste(outfilename, ".txt", sep="")) > outfilesmearplot <- paste(paste(outdir, i, sep="/")) > outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", > sep="")) > > lrt <- glmLRT(fit, contrast=contrasts[,i]) > tt <- topTags(lrt, n=nrow(d), adjust.method="BH") > rn <- rownames(tt$table) > deg <- rn[tt$table$FDR < 0.05] > > pdf(outfilesmearplot) > ###### SMEAR PLOT STARTS HERE ####### > plotSmear(d, de.tags=deg) > abline(h = c(-1, 1), col = "dodgerblue") > dev.off() > write.table(tt$table, file=outfilename, sep="\t") > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 11.0 years ago Aliaksei Holik ▴ 350

0

Entering edit mode

Am 16.12.2013 11:26, schrieb Aliaksei Holik: > Hi Christine, > > There's probably a more elegant way, but that's how I do it. > > You should be able to use R's 'text' function by taking logCPM values > of the genes you're interested in as 'x' argument, logFC values as 'y' > argument, and the gene names or ID's as 'labels' argument. > > Say top.x is your resulting "TopTags" object and it contains dataframe > called 'table', which contains among others the following columns: > Symbols - logFC - logCPM > > You should be able to subset it to the genes you're interested in by: > > ids <- c("...", "...") > gene.labels <- x$table[x$table$Symbols %in% ids,] > > Where 'ids' is a vector of gene symbols for genes of interest, but you > can use Entrez Gene IDs or any other thing you can think of, for > instance all genes with logFC > 1 etc. > > After you've plotted your smear plot from DGELRT or DGEExact object > you can mark the genes of interest with the following: > > text(x=gene.labels$logCPM, > y=gene.labels$logFC, > labels=gene.labels$Symbols, cex=0.7, pos=1) > > Note that this will not work if you're plotting the SmearPlot from > DGEList as your TopTags object will contain squeezed logCPM values > different from those in DGEList. > > Hope it helps, > > Aliaksei. > > On 16/12/13 7:07 PM, Christine Gl??er wrote: >> Dear all, >> >> I have used edgeR for analyzing differential expression. I also created >> a smear plot visualizing DGE results (code see below). I'd now like to >> highlight some genes of interest (they are significantly diff. >> expressed) in this plot by labeling them; do you know if there's a way >> to do so? >> >> Kind regards, and many thanks in advance, >> >> Christine Gl??er >> >> >> ###### Code ######## >> >> counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", >> row.names = 1, as.is = TRUE)) >> cpms<-cpm(counts) >> keep <- rowSums(cpms>1)>=4 ### at least 4 replicates >> counts <- counts[keep,] >> >> group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) >> >> d <- DGEList(counts=counts, group=group) >> d <- calcNormFactors(d) >> >> design <- model.matrix(~ 0+group) >> colnames(design) <- cols >> >> contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, >> levels=design) >> >> y <- estimateGLMCommonDisp(d,design) >> y <- estimateGLMTrendedDisp(y,design) >> y <- estimateGLMTagwiseDisp(y,design) >> fit <- glmFit(y,design) >> >> for (i in colnames(contrasts)) >> { >> print (i) >> >> outfilename <- paste(paste(outdir, i, sep="/")) >> outfilename <- paste(paste(outfilename, ".txt", sep="")) >> outfilesmearplot <- paste(paste(outdir, i, sep="/")) >> outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", >> sep="")) >> >> lrt <- glmLRT(fit, contrast=contrasts[,i]) >> tt <- topTags(lrt, n=nrow(d), adjust.method="BH") >> rn <- rownames(tt$table) >> deg <- rn[tt$table$FDR < 0.05] >> >> pdf(outfilesmearplot) >> ###### SMEAR PLOT STARTS HERE ####### >> plotSmear(d, de.tags=deg) >> abline(h = c(-1, 1), col = "dodgerblue") >> dev.off() >> write.table(tt$table, file=outfilename, sep="\t") >> } >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Dear Aliaksei, thank you very much for your help, it worked out. As an example, I have adapted the example given in the documentation in plotSmear: ################## y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") d <- estimateCommonDisp(d) # find differential expression de <- exactTest(d) # highlighting the top 500 most DE tags de.tags <- rownames(topTags(de, n=500)$table) plotSmear(d, de.tags=de.tags) de$genes <- rownames(d$counts) ids <- c("tag.2323", "tag.9") gene.labels <- de$table[de$genes %in% ids,] text(x=gene.labels$logCPM, y=gene.labels$logFC, labels=rownames(gene.labels), cex=0.7, pos=1) ################### I'd also like to add a line pointing at these specific genes (or highlight them a bit better, e.g. with a specific color) - otherwise the plot is not very informative, since there are many dots in close neighborhood to my genes of interest. Do you also know how that could be done? Thank you very much for your help, I greatly appreciate it! Kind regards, Christine Gl??er

ADD REPLY • link 11.0 years ago Christine Gläßer ▴ 60

0

Entering edit mode

Hi Christine, I'm glad it worked. You should be able to print your genes in any colour you can think of by adding 'col' argument to your 'text' function. You could also plot the points of any colour on top of your smear plot. Analogous to the way you use the 'text' function: points(x=gene.labels$logCPM, y=gene.labels$logFC, cex=1, col="red") Hope it helps, Aliaksei. On 16/12/13 11:53 PM, Christine Gl??er wrote: > Dear Aliaksei, > > thank you very much for your help, it worked out. As an example, I have > adapted the example given in the documentation in plotSmear: > > ################## > > y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) > d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) > rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") > d <- estimateCommonDisp(d) > > # find differential expression > de <- exactTest(d) > > # highlighting the top 500 most DE tags > de.tags <- rownames(topTags(de, n=500)$table) > > plotSmear(d, de.tags=de.tags) > > de$genes <- rownames(d$counts) > > ids <- c("tag.2323", "tag.9") > gene.labels <- de$table[de$genes %in% ids,] > text(x=gene.labels$logCPM, y=gene.labels$logFC, > labels=rownames(gene.labels), cex=0.7, pos=1) > > ################### > > I'd also like to add a line pointing at these specific genes (or > highlight them a bit better, e.g. with a specific color) - otherwise the > plot is not very informative, since there are many dots in close > neighborhood to my genes of interest. Do you also know how that could be > done? > > Thank you very much for your help, I greatly appreciate it! > > Kind regards, > > Christine Gl??er > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.0 years ago Aliaksei Holik ▴ 350

0

Entering edit mode

Am 16.12.2013 14:12, schrieb Aliaksei Holik: > Hi Christine, > > I'm glad it worked. You should be able to print your genes in any > colour you can think of by adding 'col' argument to your 'text' > function. You could also plot the points of any colour on top of your > smear plot. Analogous to the way you use the 'text' function: > points(x=gene.labels$logCPM, > y=gene.labels$logFC, > cex=1, col="red") > > Hope it helps, > > Aliaksei. > > On 16/12/13 11:53 PM, Christine Gl??er wrote: > >> Dear Aliaksei, >> >> thank you very much for your help, it worked out. As an example, I have >> adapted the example given in the documentation in plotSmear: >> >> ################## >> >> y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4) >> d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y)) >> rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".") >> d <- estimateCommonDisp(d) >> >> # find differential expression >> de <- exactTest(d) >> >> # highlighting the top 500 most DE tags >> de.tags <- rownames(topTags(de, n=500)$table) >> >> plotSmear(d, de.tags=de.tags) >> >> de$genes <- rownames(d$counts) >> >> ids <- c("tag.2323", "tag.9") >> gene.labels <- de$table[de$genes %in% ids,] >> text(x=gene.labels$logCPM, y=gene.labels$logFC, >> labels=rownames(gene.labels), cex=0.7, pos=1) >> >> ################### >> >> I'd also like to add a line pointing at these specific genes (or >> highlight them a bit better, e.g. with a specific color) - otherwise the >> plot is not very informative, since there are many dots in close >> neighborhood to my genes of interest. Do you also know how that could be >> done? >> >> Thank you very much for your help, I greatly appreciate it! >> >> Kind regards, >> >> Christine Gl??er >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> Hi Aliaksei, thank you for your fast and kind reply. Thank you for helping me out - I didn't think of the "points" or col function... That's indeed what I've been looking for. Kind regards, Christine

ADD REPLY • link 11.0 years ago Christine Gläßer ▴ 60

0

Entering edit mode

Just spotted a typo. It should have said: gene.labels <- x.top$table[x.top$table$Symbols %in% ids,] On 16/12/13 9:26 PM, Aliaksei Holik wrote: > Hi Christine, > > There's probably a more elegant way, but that's how I do it. > > You should be able to use R's 'text' function by taking logCPM values of > the genes you're interested in as 'x' argument, logFC values as 'y' > argument, and the gene names or ID's as 'labels' argument. > > Say top.x is your resulting "TopTags" object and it contains dataframe > called 'table', which contains among others the following columns: > Symbols - logFC - logCPM > > You should be able to subset it to the genes you're interested in by: > > ids <- c("...", "...") > gene.labels <- x$table[x$table$Symbols %in% ids,] > > Where 'ids' is a vector of gene symbols for genes of interest, but you > can use Entrez Gene IDs or any other thing you can think of, for > instance all genes with logFC > 1 etc. > > After you've plotted your smear plot from DGELRT or DGEExact object you > can mark the genes of interest with the following: > > text(x=gene.labels$logCPM, > y=gene.labels$logFC, > labels=gene.labels$Symbols, cex=0.7, pos=1) > > Note that this will not work if you're plotting the SmearPlot from > DGEList as your TopTags object will contain squeezed logCPM values > different from those in DGEList. > > Hope it helps, > > Aliaksei. > > On 16/12/13 7:07 PM, Christine Gl??er wrote: >> Dear all, >> >> I have used edgeR for analyzing differential expression. I also created >> a smear plot visualizing DGE results (code see below). I'd now like to >> highlight some genes of interest (they are significantly diff. >> expressed) in this plot by labeling them; do you know if there's a way >> to do so? >> >> Kind regards, and many thanks in advance, >> >> Christine Gl??er >> >> >> ###### Code ######## >> >> counts <- as.matrix(read.table(infile, header = TRUE, sep = "\t", >> row.names = 1, as.is = TRUE)) >> cpms<-cpm(counts) >> keep <- rowSums(cpms>1)>=4 ### at least 4 replicates >> counts <- counts[keep,] >> >> group <- factor(c(1,1,1,1,1,2,2,2,2,3,3,3,3)) >> >> d <- DGEList(counts=counts, group=group) >> d <- calcNormFactors(d) >> >> design <- model.matrix(~ 0+group) >> colnames(design) <- cols >> >> contrasts <- makeContrasts(con1 = CONTRAST1, con2 = CONTRAST2, >> levels=design) >> >> y <- estimateGLMCommonDisp(d,design) >> y <- estimateGLMTrendedDisp(y,design) >> y <- estimateGLMTagwiseDisp(y,design) >> fit <- glmFit(y,design) >> >> for (i in colnames(contrasts)) >> { >> print (i) >> >> outfilename <- paste(paste(outdir, i, sep="/")) >> outfilename <- paste(paste(outfilename, ".txt", sep="")) >> outfilesmearplot <- paste(paste(outdir, i, sep="/")) >> outfilesmearplot <- paste(paste(outfilesmearplot, "_smearplot.pdf", >> sep="")) >> >> lrt <- glmLRT(fit, contrast=contrasts[,i]) >> tt <- topTags(lrt, n=nrow(d), adjust.method="BH") >> rn <- rownames(tt$table) >> deg <- rn[tt$table$FDR < 0.05] >> >> pdf(outfilesmearplot) >> ###### SMEAR PLOT STARTS HERE ####### >> plotSmear(d, de.tags=deg) >> abline(h = c(-1, 1), col = "dodgerblue") >> dev.off() >> write.table(tt$table, file=outfilename, sep="\t") >> } >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.0 years ago Aliaksei Holik ▴ 350

Login before adding your answer.