Bioconductor Digest, Vol 121, Issue 29

Entering edit mode

Matthew Thornton ▴ 380

@matthew-thornton-5564

Last seen 6 days ago

USA, Los Angeles, USC

"bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: Send Bioconductor mailing list submissions to bioconductor at r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at r-project.org You can reach the person managing the list at bioconductor-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. Heatmaps for Agilent Single Color 4x44k using limma and vsn. (Cornish, Joseph (NIH/NIAID) [F]) 2. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (James W. MacDonald) 3. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (Cornish, Joseph (NIH/NIAID) [F]) 4. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (Sean Davis) 5. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (James W. MacDonald) 6. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (Cornish, Joseph (NIH/NIAID) [F]) 7. analysis of HuGene2.0-st array affymetrix -aroma package (Andreia Fonseca) 8. Re: Heatmaps for Agilent Single Color 4x44k using limma and vsn. (Sean Davis) 9. Interpreting DESeq2 results (Michael Muratet) 10. identifying drosophila miRNA targets (Fiona Ingleby) 11. Re: identifying drosophila miRNA targets (James W. MacDonald) 12. Re: Limit on number of sequence files for forging a BSgenome (Blanchette, Marco) 13. Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package (Shields, Rusty (IMS)) 14. Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package (Martin Morgan) 15. Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package (Shields, Rusty (IMS)) 16. Re: Interpreting DESeq2 results (Michael Love) 17. Re: Interpreting DESeq2 results (somnath bandyopadhyay) ---------------------------------------------------------------------- Message: 1 Date: Thu, 28 Mar 2013 13:27:23 +0000 From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7bc3 at="" mlbxv01.nih.gov=""> Content-Type: text/plain I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. My questions are: Why does the bayes model only have one column? Am I approaching the processing correctly? Why are there differences between the resulting heatmaps of each version? I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. Here is the simplified example I have made: library(limma) library(vsn) library(gplots) set.seed(0xabcd) #from vsn manual #constants for reading agi outs from our imager COL__ <- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}") ANO__ <- c("Position", "Name", "ID", "Description", "GeneName") #parse data targets <- readTargets("test.csv", sep = ",") gal <- readGAL("026806_D_20110524.gal") layout <- getLayout(gal) agi <- read.maimages(targets, columns = COL__, annotation = ANO__, green.only = TRUE) design <- model.matrix(~0+factor(c(1,1,2,2))) colnames(design) <- c("s1", "s2") #process bg <- backgroundCorrect(agi, method = "normexp") norm <- normalizeBetweenArrays(bg, method = "cyclicloess") avg <- avereps(norm, ID=norm$genes$ProbeName) #fit contmat <- makeContrasts(s2-s1, levels = design) fit_lm <- lmFit(avg$E, design) fit_be <- eBayes(contrasts.fit(fit_lm, contmat)) #results difexp <- topTable(fit_be, coef = 1, adjust = "BH") res <- decideTests(fit_be) comp <- vennDiagram(res) #plot pdf("example-heatmap.pdf") genes <- as.numeric(rownames(topTable(fit_be, n = 100))) heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) dev.off() The full version: library(vsn) library(limma) library(gplots) set.seed(0xabcd) #from the vsn manual #constants A_SUB__ <- 1 A_NXP__ <- 2 A_MIN__ <- 3 A_VSN__ <- 1 A_QNT__ <- 2 A_LWS__ <- 3 M_SUB__ <- "subtract" M_NXP__ <- "normexp" M_MIN__ <- "minimum" M_VSN__ <- "vsn" M_LWS__ <- "cyclicloess" M_QNT__ <- "quantile" COL__ <- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}") ANO__ <- c("Position", "Name", "ID", "Description", "GeneName") N_BGM__ <- 3 #number of background correction methods N_NRM__ <- 3 #number of normalization methods A_BGM__ <- c(A_SUB__, A_NXP__, A_MIN__) A_NRM__ <- c(A_VSN__, A_QNT__, A_LWS__) #utility methods #applies subtraction background correction with limma bg_subtract <- function(x){ return(list(norm = x$norm, bg = M_SUB__, agi = backgroundCorrect(x$agi, method = M_SUB__))) } #applies normexp backgroud correction with limma bg_normexp <- function(x){ return(list(norm = x$norm, bg = M_NXP__, agi = backgroundCorrect(x$agi, method = M_NXP__))) } #applies minimum background correction with limma bg_minimum <- function(x){ return(list(norm = x$norm, bg = M_MIN__, agi = backgroundCorrect(x$agi, method = M_MIN__))) } #applies VSN normalization with VSN package nb_vsn <- function(x){ return(list(norm = M_VSN__, bg = x$bg, agi = normalizeVSN(x$agi))) } #applies VSN normalization with VSN package nb_vsn <- function(x){ return(list(norm = M_VSN__, bg = x$bg, agi = normalizeVSN(x$agi))) } #applies cyclic lowess normalization with limma nb_lowess <- function(x){ return(list(norm = M_LWS__, bg = x$bg, agi = normalizeBetweenArrays(x$agi, method = M_LWS__))) } #applies quantile normalization with limma nb_quantile <- function(x){ return(list(norm = M_QNT__, bg = x$bg, agi = normalizeBetweenArrays(x$agi, method = M_QNT__))) } #generates a plot of MA values with limma pl_ma <- function(x){ name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") pdf(name) plotMA(x[[1]]$agi$E) dev.off() } #generates a plot of mean versus stdev with VS pl_meansd <- function(x){ name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") pdf(name) meanSdPlot(x[[1]]$agi$E) dev.off() } #generate volcanoplot of log-fold versus log-odds pl_volcano <- function(x){ name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") pdf(name) volcanoplot(x[[1]]$bayes) dev.off() } #generate plot of expression data over time pl_lines <- function(x){ name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") pdf(name) plotlines(x[[1]]$agi$E) dev.off() } #generate heatmat of final results pl_heatmap <- function(x, ntop){ name <- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") pdf(name) genes <- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) heatmap(x[[1]]$lm$coefficients[genes,], col = redgreen(75), key = TRUE) dev.off() } #averages replicates of probes m_repavg <- function(x){ return(list(agi = x[[1]]$agi, avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), bg = x[[1]]$bg, norm = x[[1]]$norm)) } #generate contrasts m_fit <- function(x, design){ contmat <- makeContrasts(s2-s1, levels = design) fit1 <- lmFit(x[[1]]$agi$E, design) fit2 <- eBayes(contrasts.fit(fit1, contmat)) diff <- topTable(fit2, coef = 1, adjust = "BH") res <- decideTests(fit2) comp <- vennDiagram(res) return(list(agi = x[[1]]$agi, bg = x[[1]]$bg, norm = x[[1]]$norm, ave = x[[1]]$ave, cont = contmat, lm = fit1, bayes = fit2, diff = diff, res = res, comp = comp)) } ###################################################################### ######### # Function to parse array data into limma objects # Args: # tfile: target file (see limma documentation) # tsep: limma defaults TSV, have to set manually for CSV and so on # galfile: GAL file for printer/layout # Returns data object with targets, genes and printer layout ###################################################################### ######### parse <- function(tfile, tsep, galfile){ targets <- readTargets(tfile, sep=tsep) gal <- readGAL(galfile) layout <- getLayout(gal) agi <- read.maimages(targets, columns = COL__, annotation = ANO__, green.only = TRUE) bg <- "none" norm <- "none" desg <- model.matrix(~0+factor(c(1,1,2,2))) colnames(desg) <- c("s1","s2") d <- list(agi = agi, gal = gal, layout = layout, targets = targets, bg = bg, desg = desg, norm = norm) processed <- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) processed <- apply(processed, 1:2, function(x){return(d)}) return(processed) } #parse command args args <- commandArgs(trailingOnly = TRUE) #load files into matrix a <- parse(args[1], args[2], args[3]) #keepy a copy of a raw dataset raw <- a[1,1] print(raw[[1]]$desg) #perform backgroung corrections a[,A_SUB__] <- lapply(a[,A_SUB__], bg_subtract) a[,A_NXP__] <- lapply(a[,A_NXP__], bg_normexp) a[,A_MIN__] <- lapply(a[,A_MIN__], bg_minimum) #perform normalization a[A_VSN__,] <- lapply(a[A_VSN__,], nb_vsn) a[A_LWS__,] <- lapply(a[A_LWS__,], nb_lowess) a[A_QNT__,] <- lapply(a[A_QNT__,], nb_quantile) #fit model and differentiate expression a <- apply(a, 1:2, m_repavg) a <- apply(a, 1:2, m_fit, design = raw[[1]]$desg) #generate plots apply(a, 1:2, pl_ma) apply(a, 1:2, pl_meansd) apply(a, 1:2, pl_volcano) apply(a, 1:2, pl_lines) apply(a, 1:2, pl_heatmap, args[4]) [[alternative HTML version deleted]] ------------------------------ Message: 2 Date: Thu, 28 Mar 2013 10:07:32 -0400 From: "James W. MacDonald" <jmacdon@uw.edu> To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <51544EA4.5080805 at uw.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Joseph, On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: > I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. > > The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. > > In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. > > To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. > > My questions are: > Why does the bayes model only have one column? Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene. > Am I approaching the processing correctly? I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop. > Why are there differences between the resulting heatmaps of each version? I don't know, as I worried my brain might explode if I read any more. But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that. Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls. Best, Jim > > I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. > > Here is the simplified example I have made: > library(limma) > library(vsn) > library(gplots) > > set.seed(0xabcd) #from vsn manual > > #constants for reading agi outs from our imager > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > > #parse data > targets<- readTargets("test.csv", sep = ",") > gal<- readGAL("026806_D_20110524.gal") > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > design<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(design)<- c("s1", "s2") > > #process > bg<- backgroundCorrect(agi, method = "normexp") > norm<- normalizeBetweenArrays(bg, method = "cyclicloess") > avg<- avereps(norm, ID=norm$genes$ProbeName) > > #fit > contmat<- makeContrasts(s2-s1, levels = design) > fit_lm<- lmFit(avg$E, design) > fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) > > #results > difexp<- topTable(fit_be, coef = 1, adjust = "BH") > res<- decideTests(fit_be) > comp<- vennDiagram(res) > > #plot > pdf("example-heatmap.pdf") > genes<- as.numeric(rownames(topTable(fit_be, n = 100))) > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) > dev.off() > > > The full version: > library(vsn) > library(limma) > library(gplots) > set.seed(0xabcd) #from the vsn manual > > #constants > A_SUB__<- 1 > A_NXP__<- 2 > A_MIN__<- 3 > A_VSN__<- 1 > A_QNT__<- 2 > A_LWS__<- 3 > M_SUB__<- "subtract" > M_NXP__<- "normexp" > M_MIN__<- "minimum" > M_VSN__<- "vsn" > M_LWS__<- "cyclicloess" > M_QNT__<- "quantile" > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > N_BGM__<- 3 #number of background correction methods > N_NRM__<- 3 #number of normalization methods > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) > > > #utility methods > #applies subtraction background correction with limma > bg_subtract<- function(x){ > return(list(norm = x$norm, > bg = M_SUB__, > agi = backgroundCorrect(x$agi, method = M_SUB__))) > } > > #applies normexp backgroud correction with limma > bg_normexp<- function(x){ > return(list(norm = x$norm, > bg = M_NXP__, > agi = backgroundCorrect(x$agi, method = M_NXP__))) > } > > #applies minimum background correction with limma > bg_minimum<- function(x){ > return(list(norm = x$norm, > bg = M_MIN__, > agi = backgroundCorrect(x$agi, method = M_MIN__))) > } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) > } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) > } > > #applies cyclic lowess normalization with limma > nb_lowess<- function(x){ > return(list(norm = M_LWS__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = M_LWS__))) > } > > #applies quantile normalization with limma > nb_quantile<- function(x){ > return(list(norm = M_QNT__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = M_QNT__))) > } > > #generates a plot of MA values with limma > pl_ma<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") > pdf(name) > plotMA(x[[1]]$agi$E) > dev.off() > } > > #generates a plot of mean versus stdev with VS > pl_meansd<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") > pdf(name) > meanSdPlot(x[[1]]$agi$E) > dev.off() > } > > #generate volcanoplot of log-fold versus log-odds > pl_volcano<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") > pdf(name) > volcanoplot(x[[1]]$bayes) > dev.off() > } > > #generate plot of expression data over time > pl_lines<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") > pdf(name) > plotlines(x[[1]]$agi$E) > dev.off() > } > > #generate heatmat of final results > pl_heatmap<- function(x, ntop){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") > pdf(name) > genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) > heatmap(x[[1]]$lm$coefficients[genes,], > col = redgreen(75), > key = TRUE) > dev.off() > } > > #averages replicates of probes > m_repavg<- function(x){ > return(list(agi = x[[1]]$agi, > avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), > bg = x[[1]]$bg, > norm = x[[1]]$norm)) > } > > #generate contrasts > m_fit<- function(x, design){ > contmat<- makeContrasts(s2-s1, > levels = design) > fit1<- lmFit(x[[1]]$agi$E, design) > fit2<- eBayes(contrasts.fit(fit1, contmat)) > diff<- topTable(fit2, coef = 1, adjust = "BH") > res<- decideTests(fit2) > comp<- vennDiagram(res) > return(list(agi = x[[1]]$agi, > bg = x[[1]]$bg, > norm = x[[1]]$norm, > ave = x[[1]]$ave, > cont = contmat, > lm = fit1, > bayes = fit2, > diff = diff, > res = res, > comp = comp)) > } > > #################################################################### ########### > # Function to parse array data into limma objects > # Args: > # tfile: target file (see limma documentation) > # tsep: limma defaults TSV, have to set manually for CSV and so on > # galfile: GAL file for printer/layout > # Returns data object with targets, genes and printer layout > #################################################################### ########### > parse<- function(tfile, tsep, galfile){ > targets<- readTargets(tfile, sep=tsep) > gal<- readGAL(galfile) > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > bg<- "none" > norm<- "none" > desg<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(desg)<- c("s1","s2") > d<- list(agi = agi, > gal = gal, > layout = layout, > targets = targets, > bg = bg, > desg = desg, > norm = norm) > > processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) > processed<- apply(processed, 1:2, function(x){return(d)}) > return(processed) > } > > #parse command args > args<- commandArgs(trailingOnly = TRUE) > > #load files into matrix > a<- parse(args[1], args[2], args[3]) > > #keepy a copy of a raw dataset > raw<- a[1,1] > print(raw[[1]]$desg) > > #perform backgroung corrections > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) > > #perform normalization > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) > > #fit model and differentiate expression > a<- apply(a, 1:2, m_repavg) > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) > > #generate plots > apply(a, 1:2, pl_ma) > apply(a, 1:2, pl_meansd) > apply(a, 1:2, pl_volcano) > apply(a, 1:2, pl_lines) > apply(a, 1:2, pl_heatmap, args[4]) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------ Message: 3 Date: Thu, 28 Mar 2013 14:22:32 +0000 From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov> To: "James W. MacDonald" <jmacdon at="" uw.edu=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7bdf at="" mlbxv01.nih.gov=""> Content-Type: text/plain; charset="us-ascii" Hello James, Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values... The differences came from using the unaveraged replicates in the larger script versus the average in the example. I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots? I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable). _________thanks_____________ _______Joe______ -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: Thursday, March 28, 2013 10:08 AM To: Cornish, Joseph (NIH/NIAID) [F] Cc: bioconductor at r-project.org Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Hi Joseph, On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: > I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. > > The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. > > In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. > > To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. > > My questions are: > Why does the bayes model only have one column? Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene. > Am I approaching the processing correctly? I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop. > Why are there differences between the resulting heatmaps of each version? I don't know, as I worried my brain might explode if I read any more. But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that. Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls. Best, Jim > > I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. > > Here is the simplified example I have made: > library(limma) > library(vsn) > library(gplots) > > set.seed(0xabcd) #from vsn manual > > #constants for reading agi outs from our imager > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > > #parse data > targets<- readTargets("test.csv", sep = ",") > gal<- readGAL("026806_D_20110524.gal") > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > design<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(design)<- c("s1", "s2") > > #process > bg<- backgroundCorrect(agi, method = "normexp") > norm<- normalizeBetweenArrays(bg, method = "cyclicloess") > avg<- avereps(norm, ID=norm$genes$ProbeName) > > #fit > contmat<- makeContrasts(s2-s1, levels = design) > fit_lm<- lmFit(avg$E, design) > fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) > > #results > difexp<- topTable(fit_be, coef = 1, adjust = "BH") > res<- decideTests(fit_be) > comp<- vennDiagram(res) > > #plot > pdf("example-heatmap.pdf") > genes<- as.numeric(rownames(topTable(fit_be, n = 100))) > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) > dev.off() > > > The full version: > library(vsn) > library(limma) > library(gplots) > set.seed(0xabcd) #from the vsn manual > > #constants > A_SUB__<- 1 > A_NXP__<- 2 > A_MIN__<- 3 > A_VSN__<- 1 > A_QNT__<- 2 > A_LWS__<- 3 > M_SUB__<- "subtract" > M_NXP__<- "normexp" > M_MIN__<- "minimum" > M_VSN__<- "vsn" > M_LWS__<- "cyclicloess" > M_QNT__<- "quantile" > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > N_BGM__<- 3 #number of background correction methods > N_NRM__<- 3 #number of normalization methods > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) > > > #utility methods > #applies subtraction background correction with limma > bg_subtract<- function(x){ > return(list(norm = x$norm, > bg = M_SUB__, > agi = backgroundCorrect(x$agi, method = M_SUB__))) } > > #applies normexp backgroud correction with limma > bg_normexp<- function(x){ > return(list(norm = x$norm, > bg = M_NXP__, > agi = backgroundCorrect(x$agi, method = M_NXP__))) } > > #applies minimum background correction with limma > bg_minimum<- function(x){ > return(list(norm = x$norm, > bg = M_MIN__, > agi = backgroundCorrect(x$agi, method = M_MIN__))) } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) } > > #applies cyclic lowess normalization with limma > nb_lowess<- function(x){ > return(list(norm = M_LWS__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = > M_LWS__))) } > > #applies quantile normalization with limma > nb_quantile<- function(x){ > return(list(norm = M_QNT__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = > M_QNT__))) } > > #generates a plot of MA values with limma > pl_ma<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") > pdf(name) > plotMA(x[[1]]$agi$E) > dev.off() > } > > #generates a plot of mean versus stdev with VS > pl_meansd<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") > pdf(name) > meanSdPlot(x[[1]]$agi$E) > dev.off() > } > > #generate volcanoplot of log-fold versus log-odds > pl_volcano<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") > pdf(name) > volcanoplot(x[[1]]$bayes) > dev.off() > } > > #generate plot of expression data over time > pl_lines<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") > pdf(name) > plotlines(x[[1]]$agi$E) > dev.off() > } > > #generate heatmat of final results > pl_heatmap<- function(x, ntop){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") > pdf(name) > genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) > heatmap(x[[1]]$lm$coefficients[genes,], > col = redgreen(75), > key = TRUE) > dev.off() > } > > #averages replicates of probes > m_repavg<- function(x){ > return(list(agi = x[[1]]$agi, > avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), > bg = x[[1]]$bg, > norm = x[[1]]$norm)) > } > > #generate contrasts > m_fit<- function(x, design){ > contmat<- makeContrasts(s2-s1, > levels = design) > fit1<- lmFit(x[[1]]$agi$E, design) > fit2<- eBayes(contrasts.fit(fit1, contmat)) > diff<- topTable(fit2, coef = 1, adjust = "BH") > res<- decideTests(fit2) > comp<- vennDiagram(res) > return(list(agi = x[[1]]$agi, > bg = x[[1]]$bg, > norm = x[[1]]$norm, > ave = x[[1]]$ave, > cont = contmat, > lm = fit1, > bayes = fit2, > diff = diff, > res = res, > comp = comp)) > } > > ###################################################################### > ######### # Function to parse array data into limma objects # Args: > # tfile: target file (see limma documentation) > # tsep: limma defaults TSV, have to set manually for CSV and so on > # galfile: GAL file for printer/layout > # Returns data object with targets, genes and printer layout > ###################################################################### > ######### > parse<- function(tfile, tsep, galfile){ > targets<- readTargets(tfile, sep=tsep) > gal<- readGAL(galfile) > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > bg<- "none" > norm<- "none" > desg<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(desg)<- c("s1","s2") > d<- list(agi = agi, > gal = gal, > layout = layout, > targets = targets, > bg = bg, > desg = desg, > norm = norm) > > processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) > processed<- apply(processed, 1:2, function(x){return(d)}) > return(processed) > } > > #parse command args > args<- commandArgs(trailingOnly = TRUE) > > #load files into matrix > a<- parse(args[1], args[2], args[3]) > > #keepy a copy of a raw dataset > raw<- a[1,1] > print(raw[[1]]$desg) > > #perform backgroung corrections > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) > > #perform normalization > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) > > #fit model and differentiate expression > a<- apply(a, 1:2, m_repavg) > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) > > #generate plots > apply(a, 1:2, pl_ma) > apply(a, 1:2, pl_meansd) > apply(a, 1:2, pl_volcano) > apply(a, 1:2, pl_lines) > apply(a, 1:2, pl_heatmap, args[4]) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------ Message: 4 Date: Thu, 28 Mar 2013 10:33:51 -0400 From: Sean Davis <sdavis2@mail.nih.gov> To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, "James W. MacDonald" <jmacdon at="" uw.edu=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <caneavbkyv+mb61jvne0aklicxovxyvelusbhgowwr61_q9s-dw at="" mail.gmail.com=""> Content-Type: text/plain On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] < joseph.cornish at nih.gov> wrote: > Hello James, > > Are you referring to function names or constants? The constants are an old > habit I picked up long ago in my CS classes, there are a ton but it makes > them stand out for me in vim more readily. I've come to avoid camel case > since switching to linux, it makes searching for things a nightmare, it can > also help if I group functions with a prefix (eg plot functions start with > "pl_"). If you want I can find+replace all of the constants with their > values... > > Hi, Joe. I think I could help to translate part of what James said. If you could boil your example down to a few lines of reproducible code that show the problem, that would be simpler to interpret and easier to comment on directly. Sometimes in the process of producing such a simplified example, I have found my own misunderstandings. It also sometimes helps to get someone local to give a second pair of eyes. > The differences came from using the unaveraged replicates in the larger > script versus the average in the example. > > I'm not sure what you mean by "mean of the controls", would this simply be > filtering out the control spots? > > No. I think he means for you to subtract the mean of the control samples (or one group of samples) for each gene from all the normalized expression values. > I'll generate the heatmaps using the normalized expression values. I > suppose in that case it would be worthwhile to filter ahead of time by > p-value and fold change (the coef in topTable). > > Yes. Sean > > _________thanks_____________ > _______Joe______ > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, March 28, 2013 10:08 AM > To: Cornish, Joseph (NIH/NIAID) [F] > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma > and vsn. > > Hi Joseph, > > > On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: > > I am attempting to write a script that will allow me to do bulk > benchmarking of various methods for normalization and background > correction. I am concerned that I am not processing the data correctly and > this may be leading to the abnormal heatmaps that I am seeing. For a test > case, I am using an experiment where I have two different conditions with > two different states with two replicates per state. > > > > The heatmaps concern me because of the uniformity between them, the > values are all completely green or completely red, there are differences > between genes within a state according to the heatmap. > > > > In the examples I have seen people have suggested that I use the > coefficients from the eBayes fitting of the data, however in my example the > eBayes LMarray object does not have two columns in the results, only on > (not sure if this is a sign of errors on my part). I have instead used the > lmFit coefficients since there are column for each matrix. > > > > To further confuse someone who is new to R, the simplified example and > full code generate different heatmaps for the same data and same methods > (normexp for background, cyclic loess for norm). The differences being in > the simplified version the gene IDs show up on the right side but in the > full version, only row number appears. The color for some genes changes > between states as well. > > > > My questions are: > > Why does the bayes model only have one column? > > Because you are computing one contrast (which is all you can do). One > contrast means one test which means one column of results. In other words, > the coefficient you are estimating in the contrasts.fit() step is the > difference in means between the two groups, which is one value per gene. > > > Am I approaching the processing correctly? > > I tried to read through that business below, but got some weird affliction > due to an apparent overdose of underscores and had to stop. > > > Why are there differences between the resulting heatmaps of each version? > > I don't know, as I worried my brain might explode if I read any more. > But you should note that it is fairly unusual to plot coefficients in a > heatmap, as what you are then presenting are log ratios, and most people > can't understand that. > > Instead you might consider getting the list of genes, and then creating a > heatmap using the normalized expression values for each sample instead. You > then would want to scale in some way. We often sweep the mean of the > controls out of the matrix (by row), so the colors can then be interpreted > as differences between the treated samples and controls. > > Best, > > Jim > > > > > > I know that the idea is to keep the amount of posted code to a minimum > but I don't see any other way here. > > > > Here is the simplified example I have made: > > library(limma) > > library(vsn) > > library(gplots) > > > > set.seed(0xabcd) #from vsn manual > > > > #constants for reading agi outs from our imager > > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > > {A}") > > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > > > > #parse data > > targets<- readTargets("test.csv", sep = ",") > > gal<- readGAL("026806_D_20110524.gal") > > layout<- getLayout(gal) > > agi<- read.maimages(targets, columns = COL__, > > annotation = ANO__, > > green.only = TRUE) > > design<- model.matrix(~0+factor(c(1,1,2,2))) > > colnames(design)<- c("s1", "s2") > > > > #process > > bg<- backgroundCorrect(agi, method = "normexp") > > norm<- normalizeBetweenArrays(bg, method = "cyclicloess") > > avg<- avereps(norm, ID=norm$genes$ProbeName) > > > > #fit > > contmat<- makeContrasts(s2-s1, levels = design) > > fit_lm<- lmFit(avg$E, design) > > fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) > > > > #results > > difexp<- topTable(fit_be, coef = 1, adjust = "BH") > > res<- decideTests(fit_be) > > comp<- vennDiagram(res) > > > > #plot > > pdf("example-heatmap.pdf") > > genes<- as.numeric(rownames(topTable(fit_be, n = 100))) > > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) > > dev.off() > > > > > > The full version: > > library(vsn) > > library(limma) > > library(gplots) > > set.seed(0xabcd) #from the vsn manual > > > > #constants > > A_SUB__<- 1 > > A_NXP__<- 2 > > A_MIN__<- 3 > > A_VSN__<- 1 > > A_QNT__<- 2 > > A_LWS__<- 3 > > M_SUB__<- "subtract" > > M_NXP__<- "normexp" > > M_MIN__<- "minimum" > > M_VSN__<- "vsn" > > M_LWS__<- "cyclicloess" > > M_QNT__<- "quantile" > > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > > {A}") > > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > > N_BGM__<- 3 #number of background correction methods > > N_NRM__<- 3 #number of normalization methods > > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) > > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) > > > > > > #utility methods > > #applies subtraction background correction with limma > > bg_subtract<- function(x){ > > return(list(norm = x$norm, > > bg = M_SUB__, > > agi = backgroundCorrect(x$agi, method = M_SUB__))) } > > > > #applies normexp backgroud correction with limma > > bg_normexp<- function(x){ > > return(list(norm = x$norm, > > bg = M_NXP__, > > agi = backgroundCorrect(x$agi, method = M_NXP__))) } > > > > #applies minimum background correction with limma > > bg_minimum<- function(x){ > > return(list(norm = x$norm, > > bg = M_MIN__, > > agi = backgroundCorrect(x$agi, method = M_MIN__))) } > > > > #applies VSN normalization with VSN package > > nb_vsn<- function(x){ > > return(list(norm = M_VSN__, > > bg = x$bg, > > agi = normalizeVSN(x$agi))) } > > > > #applies VSN normalization with VSN package > > nb_vsn<- function(x){ > > return(list(norm = M_VSN__, > > bg = x$bg, > > agi = normalizeVSN(x$agi))) } > > > > #applies cyclic lowess normalization with limma > > nb_lowess<- function(x){ > > return(list(norm = M_LWS__, > > bg = x$bg, > > agi = normalizeBetweenArrays(x$agi, method = > > M_LWS__))) } > > > > #applies quantile normalization with limma > > nb_quantile<- function(x){ > > return(list(norm = M_QNT__, > > bg = x$bg, > > agi = normalizeBetweenArrays(x$agi, method = > > M_QNT__))) } > > > > #generates a plot of MA values with limma > > pl_ma<- function(x){ > > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") > > pdf(name) > > plotMA(x[[1]]$agi$E) > > dev.off() > > } > > > > #generates a plot of mean versus stdev with VS > > pl_meansd<- function(x){ > > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") > > pdf(name) > > meanSdPlot(x[[1]]$agi$E) > > dev.off() > > } > > > > #generate volcanoplot of log-fold versus log-odds > > pl_volcano<- function(x){ > > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") > > pdf(name) > > volcanoplot(x[[1]]$bayes) > > dev.off() > > } > > > > #generate plot of expression data over time > > pl_lines<- function(x){ > > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") > > pdf(name) > > plotlines(x[[1]]$agi$E) > > dev.off() > > } > > > > #generate heatmat of final results > > pl_heatmap<- function(x, ntop){ > > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") > > pdf(name) > > genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) > > heatmap(x[[1]]$lm$coefficients[genes,], > > col = redgreen(75), > > key = TRUE) > > dev.off() > > } > > > > #averages replicates of probes > > m_repavg<- function(x){ > > return(list(agi = x[[1]]$agi, > > avg = avereps(x[[1]]$agi, > ID=x[[1]]$agi$genes$ProbeName), > > bg = x[[1]]$bg, > > norm = x[[1]]$norm)) > > } > > > > #generate contrasts > > m_fit<- function(x, design){ > > contmat<- makeContrasts(s2-s1, > > levels = design) > > fit1<- lmFit(x[[1]]$agi$E, design) > > fit2<- eBayes(contrasts.fit(fit1, contmat)) > > diff<- topTable(fit2, coef = 1, adjust = "BH") > > res<- decideTests(fit2) > > comp<- vennDiagram(res) > > return(list(agi = x[[1]]$agi, > > bg = x[[1]]$bg, > > norm = x[[1]]$norm, > > ave = x[[1]]$ave, > > cont = contmat, > > lm = fit1, > > bayes = fit2, > > diff = diff, > > res = res, > > comp = comp)) > > } > > > > ###################################################################### > > ######### # Function to parse array data into limma objects # Args: > > # tfile: target file (see limma documentation) > > # tsep: limma defaults TSV, have to set manually for CSV and so on > > # galfile: GAL file for printer/layout > > # Returns data object with targets, genes and printer layout > > ###################################################################### > > ######### > > parse<- function(tfile, tsep, galfile){ > > targets<- readTargets(tfile, sep=tsep) > > gal<- readGAL(galfile) > > layout<- getLayout(gal) > > agi<- read.maimages(targets, columns = COL__, > > annotation = ANO__, > > green.only = TRUE) > > bg<- "none" > > norm<- "none" > > desg<- model.matrix(~0+factor(c(1,1,2,2))) > > colnames(desg)<- c("s1","s2") > > d<- list(agi = agi, > > gal = gal, > > layout = layout, > > targets = targets, > > bg = bg, > > desg = desg, > > norm = norm) > > > > processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) > > processed<- apply(processed, 1:2, function(x){return(d)}) > > return(processed) > > } > > > > #parse command args > > args<- commandArgs(trailingOnly = TRUE) > > > > #load files into matrix > > a<- parse(args[1], args[2], args[3]) > > > > #keepy a copy of a raw dataset > > raw<- a[1,1] > > print(raw[[1]]$desg) > > > > #perform backgroung corrections > > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) > > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) > > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) > > > > #perform normalization > > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) > > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) > > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) > > > > #fit model and differentiate expression > > a<- apply(a, 1:2, m_repavg) > > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) > > > > #generate plots > > apply(a, 1:2, pl_ma) > > apply(a, 1:2, pl_meansd) > > apply(a, 1:2, pl_volcano) > > apply(a, 1:2, pl_lines) > > apply(a, 1:2, pl_heatmap, args[4]) > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ------------------------------ Message: 5 Date: Thu, 28 Mar 2013 10:45:21 -0400 From: "James W. MacDonald" <jmacdon@uw.edu> To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <51545781.7000100 at uw.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi Joe, I don't mean to be critical. If you like to use gobtons of underscores, then rock on. I would just find it very distracting. Anyway, by 'mean of the controls', I mean the average value of the control samples. But this could be generalized in any way. For example, your contrast was s2 - s1, so you could do mat <- avg$E[<index of="" interesting="" genes="">,] mn <- fit_lm[<index of="" interesting="" genes="">,"s1"] swept <- sweep(mat, 1, mn, "-") and then do the heatmap. In this case the colors would be interpreted as the log fold difference between any sample and the mean of the s1 samples. But the general take home message here is that you can't just use the raw expression values, as you will then have a solid color to your heatmap. Other possibilities are to use the scale argument to heatmap.2() to scale by row (convert to z-scores), in which case the interpretation of the colors is less obvious. Best, Jim On 3/28/2013 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: > Hello James, > > Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values... > > The differences came from using the unaveraged replicates in the larger script versus the average in the example. > > I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots? > > I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable). > > > _________thanks_____________ > _______Joe______ > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: Thursday, March 28, 2013 10:08 AM > To: Cornish, Joseph (NIH/NIAID) [F] > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. > > Hi Joseph, > > > On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: >> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. >> >> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. >> >> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. >> >> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. >> >> My questions are: >> Why does the bayes model only have one column? > Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene. > >> Am I approaching the processing correctly? > I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop. > >> Why are there differences between the resulting heatmaps of each version? > I don't know, as I worried my brain might explode if I read any more. > But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that. > > Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls. > > Best, > > Jim > > >> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. >> >> Here is the simplified example I have made: >> library(limma) >> library(vsn) >> library(gplots) >> >> set.seed(0xabcd) #from vsn manual >> >> #constants for reading agi outs from our imager >> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) >> {A}") >> ANO__<- c("Position", "Name", "ID", "Description", "GeneName") >> >> #parse data >> targets<- readTargets("test.csv", sep = ",") >> gal<- readGAL("026806_D_20110524.gal") >> layout<- getLayout(gal) >> agi<- read.maimages(targets, columns = COL__, >> annotation = ANO__, >> green.only = TRUE) >> design<- model.matrix(~0+factor(c(1,1,2,2))) >> colnames(design)<- c("s1", "s2") >> >> #process >> bg<- backgroundCorrect(agi, method = "normexp") >> norm<- normalizeBetweenArrays(bg, method = "cyclicloess") >> avg<- avereps(norm, ID=norm$genes$ProbeName) >> >> #fit >> contmat<- makeContrasts(s2-s1, levels = design) >> fit_lm<- lmFit(avg$E, design) >> fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) >> >> #results >> difexp<- topTable(fit_be, coef = 1, adjust = "BH") >> res<- decideTests(fit_be) >> comp<- vennDiagram(res) >> >> #plot >> pdf("example-heatmap.pdf") >> genes<- as.numeric(rownames(topTable(fit_be, n = 100))) >> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) >> dev.off() >> >> >> The full version: >> library(vsn) >> library(limma) >> library(gplots) >> set.seed(0xabcd) #from the vsn manual >> >> #constants >> A_SUB__<- 1 >> A_NXP__<- 2 >> A_MIN__<- 3 >> A_VSN__<- 1 >> A_QNT__<- 2 >> A_LWS__<- 3 >> M_SUB__<- "subtract" >> M_NXP__<- "normexp" >> M_MIN__<- "minimum" >> M_VSN__<- "vsn" >> M_LWS__<- "cyclicloess" >> M_QNT__<- "quantile" >> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) >> {A}") >> ANO__<- c("Position", "Name", "ID", "Description", "GeneName") >> N_BGM__<- 3 #number of background correction methods >> N_NRM__<- 3 #number of normalization methods >> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) >> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) >> >> >> #utility methods >> #applies subtraction background correction with limma >> bg_subtract<- function(x){ >> return(list(norm = x$norm, >> bg = M_SUB__, >> agi = backgroundCorrect(x$agi, method = M_SUB__))) } >> >> #applies normexp backgroud correction with limma >> bg_normexp<- function(x){ >> return(list(norm = x$norm, >> bg = M_NXP__, >> agi = backgroundCorrect(x$agi, method = M_NXP__))) } >> >> #applies minimum background correction with limma >> bg_minimum<- function(x){ >> return(list(norm = x$norm, >> bg = M_MIN__, >> agi = backgroundCorrect(x$agi, method = M_MIN__))) } >> >> #applies VSN normalization with VSN package >> nb_vsn<- function(x){ >> return(list(norm = M_VSN__, >> bg = x$bg, >> agi = normalizeVSN(x$agi))) } >> >> #applies VSN normalization with VSN package >> nb_vsn<- function(x){ >> return(list(norm = M_VSN__, >> bg = x$bg, >> agi = normalizeVSN(x$agi))) } >> >> #applies cyclic lowess normalization with limma >> nb_lowess<- function(x){ >> return(list(norm = M_LWS__, >> bg = x$bg, >> agi = normalizeBetweenArrays(x$agi, method = >> M_LWS__))) } >> >> #applies quantile normalization with limma >> nb_quantile<- function(x){ >> return(list(norm = M_QNT__, >> bg = x$bg, >> agi = normalizeBetweenArrays(x$agi, method = >> M_QNT__))) } >> >> #generates a plot of MA values with limma >> pl_ma<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") >> pdf(name) >> plotMA(x[[1]]$agi$E) >> dev.off() >> } >> >> #generates a plot of mean versus stdev with VS >> pl_meansd<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") >> pdf(name) >> meanSdPlot(x[[1]]$agi$E) >> dev.off() >> } >> >> #generate volcanoplot of log-fold versus log-odds >> pl_volcano<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") >> pdf(name) >> volcanoplot(x[[1]]$bayes) >> dev.off() >> } >> >> #generate plot of expression data over time >> pl_lines<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") >> pdf(name) >> plotlines(x[[1]]$agi$E) >> dev.off() >> } >> >> #generate heatmat of final results >> pl_heatmap<- function(x, ntop){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") >> pdf(name) >> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) >> heatmap(x[[1]]$lm$coefficients[genes,], >> col = redgreen(75), >> key = TRUE) >> dev.off() >> } >> >> #averages replicates of probes >> m_repavg<- function(x){ >> return(list(agi = x[[1]]$agi, >> avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), >> bg = x[[1]]$bg, >> norm = x[[1]]$norm)) >> } >> >> #generate contrasts >> m_fit<- function(x, design){ >> contmat<- makeContrasts(s2-s1, >> levels = design) >> fit1<- lmFit(x[[1]]$agi$E, design) >> fit2<- eBayes(contrasts.fit(fit1, contmat)) >> diff<- topTable(fit2, coef = 1, adjust = "BH") >> res<- decideTests(fit2) >> comp<- vennDiagram(res) >> return(list(agi = x[[1]]$agi, >> bg = x[[1]]$bg, >> norm = x[[1]]$norm, >> ave = x[[1]]$ave, >> cont = contmat, >> lm = fit1, >> bayes = fit2, >> diff = diff, >> res = res, >> comp = comp)) >> } >> >> ###################################################################### >> ######### # Function to parse array data into limma objects # Args: >> # tfile: target file (see limma documentation) >> # tsep: limma defaults TSV, have to set manually for CSV and so on >> # galfile: GAL file for printer/layout >> # Returns data object with targets, genes and printer layout >> ###################################################################### >> ######### >> parse<- function(tfile, tsep, galfile){ >> targets<- readTargets(tfile, sep=tsep) >> gal<- readGAL(galfile) >> layout<- getLayout(gal) >> agi<- read.maimages(targets, columns = COL__, >> annotation = ANO__, >> green.only = TRUE) >> bg<- "none" >> norm<- "none" >> desg<- model.matrix(~0+factor(c(1,1,2,2))) >> colnames(desg)<- c("s1","s2") >> d<- list(agi = agi, >> gal = gal, >> layout = layout, >> targets = targets, >> bg = bg, >> desg = desg, >> norm = norm) >> >> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) >> processed<- apply(processed, 1:2, function(x){return(d)}) >> return(processed) >> } >> >> #parse command args >> args<- commandArgs(trailingOnly = TRUE) >> >> #load files into matrix >> a<- parse(args[1], args[2], args[3]) >> >> #keepy a copy of a raw dataset >> raw<- a[1,1] >> print(raw[[1]]$desg) >> >> #perform backgroung corrections >> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) >> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) >> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) >> >> #perform normalization >> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) >> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) >> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) >> >> #fit model and differentiate expression >> a<- apply(a, 1:2, m_repavg) >> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) >> >> #generate plots >> apply(a, 1:2, pl_ma) >> apply(a, 1:2, pl_meansd) >> apply(a, 1:2, pl_volcano) >> apply(a, 1:2, pl_lines) >> apply(a, 1:2, pl_heatmap, args[4]) >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------ Message: 6 Date: Thu, 28 Mar 2013 15:00:49 +0000 From: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish@nih.gov> To: "Davis, Sean (NIH/NCI) [E]" <sdavis2 at="" mail.nih.gov=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, "James W. MacDonald" <jmacdon at="" uw.edu=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <e68ea43fbf80174c85a92bc24efaa3b84b7c02 at="" mlbxv01.nih.gov=""> Content-Type: text/plain Hello Davis, I posted a simple example, which after fixing a problem in the longer example no produces the same results. I only posted the longer one because I wasn???t able to reproduce the results of the longer version initially. I???ve been sort of banging my head on this for a while so I missed the difference between ???agi??? and ???avg???. I???m thankful anyone bothered to look after posting the code, so thanks. If I sounded offended by the underscores, I was only admitting that I know they???re terrible. c: Thanks for clarifying the subtraction of the mean aspect. That has been one aspect that has been troubling when it comes to learning this analysis, there seem to be plenty of ways to do things but, without experience, no way to tell what is better. I will give the mean subtraction and the z-score scaling a try. Am I reading too much into the usefulness of these heatmaps? In most papers I???ve seen they seem to be there because the look nice or the attached clustering might be of some use. Thanks again! From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E] Sent: Thursday, March 28, 2013 10:34 AM To: Cornish, Joseph (NIH/NIAID) [F] Cc: James W. MacDonald; bioconductor at r-project.org Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <joseph.cornish at="" nih.gov<mailto:joseph.cornish="" at="" nih.gov="">> wrote: Hello James, Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values... Hi, Joe. I think I could help to translate part of what James said. If you could boil your example down to a few lines of reproducible code that show the problem, that would be simpler to interpret and easier to comment on directly. Sometimes in the process of producing such a simplified example, I have found my own misunderstandings. It also sometimes helps to get someone local to give a second pair of eyes. The differences came from using the unaveraged replicates in the larger script versus the average in the example. I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots? No. I think he means for you to subtract the mean of the control samples (or one group of samples) for each gene from all the normalized expression values. I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable). Yes. Sean _________thanks_____________ _______Joe______ -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu<mailto:jmacdon@uw.edu>] Sent: Thursday, March 28, 2013 10:08 AM To: Cornish, Joseph (NIH/NIAID) [F] Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Hi Joseph, On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: > I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. > > The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. > > In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. > > To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. > > My questions are: > Why does the bayes model only have one column? Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene. > Am I approaching the processing correctly? I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop. > Why are there differences between the resulting heatmaps of each version? I don't know, as I worried my brain might explode if I read any more. But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that. Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls. Best, Jim > > I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. > > Here is the simplified example I have made: > library(limma) > library(vsn) > library(gplots) > > set.seed(0xabcd) #from vsn manual > > #constants for reading agi outs from our imager > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > > #parse data > targets<- readTargets("test.csv", sep = ",") > gal<- readGAL("026806_D_20110524.gal") > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > design<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(design)<- c("s1", "s2") > > #process > bg<- backgroundCorrect(agi, method = "normexp") > norm<- normalizeBetweenArrays(bg, method = "cyclicloess") > avg<- avereps(norm, ID=norm$genes$ProbeName) > > #fit > contmat<- makeContrasts(s2-s1, levels = design) > fit_lm<- lmFit(avg$E, design) > fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) > > #results > difexp<- topTable(fit_be, coef = 1, adjust = "BH") > res<- decideTests(fit_be) > comp<- vennDiagram(res) > > #plot > pdf("example-heatmap.pdf") > genes<- as.numeric(rownames(topTable(fit_be, n = 100))) > heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) > dev.off() > > > The full version: > library(vsn) > library(limma) > library(gplots) > set.seed(0xabcd) #from the vsn manual > > #constants > A_SUB__<- 1 > A_NXP__<- 2 > A_MIN__<- 3 > A_VSN__<- 1 > A_QNT__<- 2 > A_LWS__<- 3 > M_SUB__<- "subtract" > M_NXP__<- "normexp" > M_MIN__<- "minimum" > M_VSN__<- "vsn" > M_LWS__<- "cyclicloess" > M_QNT__<- "quantile" > COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) > {A}") > ANO__<- c("Position", "Name", "ID", "Description", "GeneName") > N_BGM__<- 3 #number of background correction methods > N_NRM__<- 3 #number of normalization methods > A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) > A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) > > > #utility methods > #applies subtraction background correction with limma > bg_subtract<- function(x){ > return(list(norm = x$norm, > bg = M_SUB__, > agi = backgroundCorrect(x$agi, method = M_SUB__))) } > > #applies normexp backgroud correction with limma > bg_normexp<- function(x){ > return(list(norm = x$norm, > bg = M_NXP__, > agi = backgroundCorrect(x$agi, method = M_NXP__))) } > > #applies minimum background correction with limma > bg_minimum<- function(x){ > return(list(norm = x$norm, > bg = M_MIN__, > agi = backgroundCorrect(x$agi, method = M_MIN__))) } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) } > > #applies VSN normalization with VSN package > nb_vsn<- function(x){ > return(list(norm = M_VSN__, > bg = x$bg, > agi = normalizeVSN(x$agi))) } > > #applies cyclic lowess normalization with limma > nb_lowess<- function(x){ > return(list(norm = M_LWS__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = > M_LWS__))) } > > #applies quantile normalization with limma > nb_quantile<- function(x){ > return(list(norm = M_QNT__, > bg = x$bg, > agi = normalizeBetweenArrays(x$agi, method = > M_QNT__))) } > > #generates a plot of MA values with limma > pl_ma<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") > pdf(name) > plotMA(x[[1]]$agi$E) > dev.off() > } > > #generates a plot of mean versus stdev with VS > pl_meansd<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") > pdf(name) > meanSdPlot(x[[1]]$agi$E) > dev.off() > } > > #generate volcanoplot of log-fold versus log-odds > pl_volcano<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") > pdf(name) > volcanoplot(x[[1]]$bayes) > dev.off() > } > > #generate plot of expression data over time > pl_lines<- function(x){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") > pdf(name) > plotlines(x[[1]]$agi$E) > dev.off() > } > > #generate heatmat of final results > pl_heatmap<- function(x, ntop){ > name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") > pdf(name) > genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) > heatmap(x[[1]]$lm$coefficients[genes,], > col = redgreen(75), > key = TRUE) > dev.off() > } > > #averages replicates of probes > m_repavg<- function(x){ > return(list(agi = x[[1]]$agi, > avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), > bg = x[[1]]$bg, > norm = x[[1]]$norm)) > } > > #generate contrasts > m_fit<- function(x, design){ > contmat<- makeContrasts(s2-s1, > levels = design) > fit1<- lmFit(x[[1]]$agi$E, design) > fit2<- eBayes(contrasts.fit(fit1, contmat)) > diff<- topTable(fit2, coef = 1, adjust = "BH") > res<- decideTests(fit2) > comp<- vennDiagram(res) > return(list(agi = x[[1]]$agi, > bg = x[[1]]$bg, > norm = x[[1]]$norm, > ave = x[[1]]$ave, > cont = contmat, > lm = fit1, > bayes = fit2, > diff = diff, > res = res, > comp = comp)) > } > > ###################################################################### > ######### # Function to parse array data into limma objects # Args: > # tfile: target file (see limma documentation) > # tsep: limma defaults TSV, have to set manually for CSV and so on > # galfile: GAL file for printer/layout > # Returns data object with targets, genes and printer layout > ###################################################################### > ######### > parse<- function(tfile, tsep, galfile){ > targets<- readTargets(tfile, sep=tsep) > gal<- readGAL(galfile) > layout<- getLayout(gal) > agi<- read.maimages(targets, columns = COL__, > annotation = ANO__, > green.only = TRUE) > bg<- "none" > norm<- "none" > desg<- model.matrix(~0+factor(c(1,1,2,2))) > colnames(desg)<- c("s1","s2") > d<- list(agi = agi, > gal = gal, > layout = layout, > targets = targets, > bg = bg, > desg = desg, > norm = norm) > > processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) > processed<- apply(processed, 1:2, function(x){return(d)}) > return(processed) > } > > #parse command args > args<- commandArgs(trailingOnly = TRUE) > > #load files into matrix > a<- parse(args[1], args[2], args[3]) > > #keepy a copy of a raw dataset > raw<- a[1,1] > print(raw[[1]]$desg) > > #perform backgroung corrections > a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) > a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) > a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) > > #perform normalization > a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) > a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) > a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) > > #fit model and differentiate expression > a<- apply(a, 1:2, m_repavg) > a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) > > #generate plots > apply(a, 1:2, pl_ma) > apply(a, 1:2, pl_meansd) > apply(a, 1:2, pl_volcano) > apply(a, 1:2, pl_lines) > apply(a, 1:2, pl_heatmap, args[4]) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]] ------------------------------ Message: 7 Date: Thu, 28 Mar 2013 15:18:59 +0000 From: Andreia Fonseca <andreia.fonseca@gmail.com> To: bioconductor <bioconductor at="" stat.math.ethz.ch=""> Subject: [BioC] analysis of HuGene2.0-st array affymetrix -aroma package Message-ID: <cag43sel74j9eypo8izygsy7=c0jye5bjw5egcohkug9mmyp_fq at="" mail.gmail.com=""> Content-Type: text/plain Dear all, I am analysing HuGene2.0-st array affymetrix using aroma and limma package. I have 2 treatments and 3 arrays per condition. I have getting 11 genes that are differentially expressed, but these loose significance after applying fdr correction. I want to apply a filter out genes with similar distributions between treatments before applying fdr correction to see I am loosing significance due to overcorrection. Can someone advise me on a function or package to do this? I have explored genefilter and I could not find a function for this. Thanks for the help. Kind regards, Andreia ---------------------------------------------------------------------- ------------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral at fm.ul.pt ; andreiaamaral at fc.ul.pt [[alternative HTML version deleted]] ------------------------------ Message: 8 Date: Thu, 28 Mar 2013 11:47:12 -0400 From: Sean Davis <sdavis2@mail.nih.gov> To: "Cornish, Joseph (NIH/NIAID) [F]" <joseph.cornish at="" nih.gov=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, "James W. MacDonald" <jmacdon at="" uw.edu=""> Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. Message-ID: <caneavb=xn9c2bjqk6gf5dptghbltgrcov+oh0t05qsmd1s92dg at="" mail.gmail.com=""> Content-Type: text/plain; charset=UTF-8 On Thu, Mar 28, 2013 at 11:00 AM, Cornish, Joseph (NIH/NIAID) [F] <joseph.cornish at="" nih.gov=""> wrote: > Hello Davis, > > I posted a simple example, which after fixing a problem in the longer example no produces the same results. I only posted the longer one because I wasn?t able to reproduce the results of the longer version initially. I?ve been sort of banging my head on this for a while so I missed the difference between ?agi? and ?avg?. I?m thankful anyone bothered to look after posting the code, so thanks. If I sounded offended by the underscores, I was only admitting that I know they?re terrible. c: > > Thanks for clarifying the subtraction of the mean aspect. That has been one aspect that has been troubling when it comes to learning this analysis, there seem to be plenty of ways to do things but, without experience, no way to tell what is better. > There is not a "right" way to do this since the only metric I have seen for the evaluation of the color scale in a heatmap is subjective and qualitative. > I will give the mean subtraction and the z-score scaling a try. Am I reading too much into the usefulness of these heatmaps? In most papers I?ve seen they seem to be there because the look nice or the attached clustering might be of some use. > Heatmaps are very good visualization tools. I would not try to attach more utility to them than that. Sean > Thanks again! > > From: Davis, Sean (NCI) On Behalf Of Davis, Sean (NIH/NCI) [E] > Sent: Thursday, March 28, 2013 10:34 AM > To: Cornish, Joseph (NIH/NIAID) [F] > Cc: James W. MacDonald; bioconductor at r-project.org > Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. > > > > On Thu, Mar 28, 2013 at 10:22 AM, Cornish, Joseph (NIH/NIAID) [F] <joseph.cornish at="" nih.gov<mailto:joseph.cornish="" at="" nih.gov="">> wrote: > Hello James, > > Are you referring to function names or constants? The constants are an old habit I picked up long ago in my CS classes, there are a ton but it makes them stand out for me in vim more readily. I've come to avoid camel case since switching to linux, it makes searching for things a nightmare, it can also help if I group functions with a prefix (eg plot functions start with "pl_"). If you want I can find+replace all of the constants with their values... > > Hi, Joe. > > I think I could help to translate part of what James said. If you could boil your example down to a few lines of reproducible code that show the problem, that would be simpler to interpret and easier to comment on directly. Sometimes in the process of producing such a simplified example, I have found my own misunderstandings. It also sometimes helps to get someone local to give a second pair of eyes. > > The differences came from using the unaveraged replicates in the larger script versus the average in the example. > > I'm not sure what you mean by "mean of the controls", would this simply be filtering out the control spots? > > No. I think he means for you to subtract the mean of the control samples (or one group of samples) for each gene from all the normalized expression values. > > I'll generate the heatmaps using the normalized expression values. I suppose in that case it would be worthwhile to filter ahead of time by p-value and fold change (the coef in topTable). > > Yes. > > Sean > > > _________thanks_____________ > _______Joe______ > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at="" uw.edu="">] > Sent: Thursday, March 28, 2013 10:08 AM > To: Cornish, Joseph (NIH/NIAID) [F] > Cc: bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > Subject: Re: [BioC] Heatmaps for Agilent Single Color 4x44k using limma and vsn. > > Hi Joseph, > > > On 3/28/2013 9:27 AM, Cornish, Joseph (NIH/NIAID) [F] wrote: >> I am attempting to write a script that will allow me to do bulk benchmarking of various methods for normalization and background correction. I am concerned that I am not processing the data correctly and this may be leading to the abnormal heatmaps that I am seeing. For a test case, I am using an experiment where I have two different conditions with two different states with two replicates per state. >> >> The heatmaps concern me because of the uniformity between them, the values are all completely green or completely red, there are differences between genes within a state according to the heatmap. >> >> In the examples I have seen people have suggested that I use the coefficients from the eBayes fitting of the data, however in my example the eBayes LMarray object does not have two columns in the results, only on (not sure if this is a sign of errors on my part). I have instead used the lmFit coefficients since there are column for each matrix. >> >> To further confuse someone who is new to R, the simplified example and full code generate different heatmaps for the same data and same methods (normexp for background, cyclic loess for norm). The differences being in the simplified version the gene IDs show up on the right side but in the full version, only row number appears. The color for some genes changes between states as well. >> >> My questions are: >> Why does the bayes model only have one column? > > Because you are computing one contrast (which is all you can do). One contrast means one test which means one column of results. In other words, the coefficient you are estimating in the contrasts.fit() step is the difference in means between the two groups, which is one value per gene. > >> Am I approaching the processing correctly? > > I tried to read through that business below, but got some weird affliction due to an apparent overdose of underscores and had to stop. > >> Why are there differences between the resulting heatmaps of each version? > > I don't know, as I worried my brain might explode if I read any more. > But you should note that it is fairly unusual to plot coefficients in a heatmap, as what you are then presenting are log ratios, and most people can't understand that. > > Instead you might consider getting the list of genes, and then creating a heatmap using the normalized expression values for each sample instead. You then would want to scale in some way. We often sweep the mean of the controls out of the matrix (by row), so the colors can then be interpreted as differences between the treated samples and controls. > > Best, > > Jim > > >> >> I know that the idea is to keep the amount of posted code to a minimum but I don't see any other way here. >> >> Here is the simplified example I have made: >> library(limma) >> library(vsn) >> library(gplots) >> >> set.seed(0xabcd) #from vsn manual >> >> #constants for reading agi outs from our imager >> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) >> {A}") >> ANO__<- c("Position", "Name", "ID", "Description", "GeneName") >> >> #parse data >> targets<- readTargets("test.csv", sep = ",") >> gal<- readGAL("026806_D_20110524.gal") >> layout<- getLayout(gal) >> agi<- read.maimages(targets, columns = COL__, >> annotation = ANO__, >> green.only = TRUE) >> design<- model.matrix(~0+factor(c(1,1,2,2))) >> colnames(design)<- c("s1", "s2") >> >> #process >> bg<- backgroundCorrect(agi, method = "normexp") >> norm<- normalizeBetweenArrays(bg, method = "cyclicloess") >> avg<- avereps(norm, ID=norm$genes$ProbeName) >> >> #fit >> contmat<- makeContrasts(s2-s1, levels = design) >> fit_lm<- lmFit(avg$E, design) >> fit_be<- eBayes(contrasts.fit(fit_lm, contmat)) >> >> #results >> difexp<- topTable(fit_be, coef = 1, adjust = "BH") >> res<- decideTests(fit_be) >> comp<- vennDiagram(res) >> >> #plot >> pdf("example-heatmap.pdf") >> genes<- as.numeric(rownames(topTable(fit_be, n = 100))) >> heatmap(fit_lm$coefficients[genes,], col=redgreen(75), key = TRUE) >> dev.off() >> >> >> The full version: >> library(vsn) >> library(limma) >> library(gplots) >> set.seed(0xabcd) #from the vsn manual >> >> #constants >> A_SUB__<- 1 >> A_NXP__<- 2 >> A_MIN__<- 3 >> A_VSN__<- 1 >> A_QNT__<- 2 >> A_LWS__<- 3 >> M_SUB__<- "subtract" >> M_NXP__<- "normexp" >> M_MIN__<- "minimum" >> M_VSN__<- "vsn" >> M_LWS__<- "cyclicloess" >> M_QNT__<- "quantile" >> COL__<- list(G="Raw intensity (tr.mean) {A}",Gb="Background (tr.mean) >> {A}") >> ANO__<- c("Position", "Name", "ID", "Description", "GeneName") >> N_BGM__<- 3 #number of background correction methods >> N_NRM__<- 3 #number of normalization methods >> A_BGM__<- c(A_SUB__, A_NXP__, A_MIN__) >> A_NRM__<- c(A_VSN__, A_QNT__, A_LWS__) >> >> >> #utility methods >> #applies subtraction background correction with limma >> bg_subtract<- function(x){ >> return(list(norm = x$norm, >> bg = M_SUB__, >> agi = backgroundCorrect(x$agi, method = M_SUB__))) } >> >> #applies normexp backgroud correction with limma >> bg_normexp<- function(x){ >> return(list(norm = x$norm, >> bg = M_NXP__, >> agi = backgroundCorrect(x$agi, method = M_NXP__))) } >> >> #applies minimum background correction with limma >> bg_minimum<- function(x){ >> return(list(norm = x$norm, >> bg = M_MIN__, >> agi = backgroundCorrect(x$agi, method = M_MIN__))) } >> >> #applies VSN normalization with VSN package >> nb_vsn<- function(x){ >> return(list(norm = M_VSN__, >> bg = x$bg, >> agi = normalizeVSN(x$agi))) } >> >> #applies VSN normalization with VSN package >> nb_vsn<- function(x){ >> return(list(norm = M_VSN__, >> bg = x$bg, >> agi = normalizeVSN(x$agi))) } >> >> #applies cyclic lowess normalization with limma >> nb_lowess<- function(x){ >> return(list(norm = M_LWS__, >> bg = x$bg, >> agi = normalizeBetweenArrays(x$agi, method = >> M_LWS__))) } >> >> #applies quantile normalization with limma >> nb_quantile<- function(x){ >> return(list(norm = M_QNT__, >> bg = x$bg, >> agi = normalizeBetweenArrays(x$agi, method = >> M_QNT__))) } >> >> #generates a plot of MA values with limma >> pl_ma<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-ma.pdf"), collapse="") >> pdf(name) >> plotMA(x[[1]]$agi$E) >> dev.off() >> } >> >> #generates a plot of mean versus stdev with VS >> pl_meansd<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-meansd.pdf"), collapse="") >> pdf(name) >> meanSdPlot(x[[1]]$agi$E) >> dev.off() >> } >> >> #generate volcanoplot of log-fold versus log-odds >> pl_volcano<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-volcano.pdf"), collapse="") >> pdf(name) >> volcanoplot(x[[1]]$bayes) >> dev.off() >> } >> >> #generate plot of expression data over time >> pl_lines<- function(x){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-lines.pdf"), collapse="") >> pdf(name) >> plotlines(x[[1]]$agi$E) >> dev.off() >> } >> >> #generate heatmat of final results >> pl_heatmap<- function(x, ntop){ >> name<- paste(c(x[[1]]$bg, x[[1]]$norm, "-heatmap.pdf"), collapse="") >> pdf(name) >> genes<- as.numeric(rownames(topTable(x[[1]]$bayes, n = ntop))) >> heatmap(x[[1]]$lm$coefficients[genes,], >> col = redgreen(75), >> key = TRUE) >> dev.off() >> } >> >> #averages replicates of probes >> m_repavg<- function(x){ >> return(list(agi = x[[1]]$agi, >> avg = avereps(x[[1]]$agi, ID=x[[1]]$agi$genes$ProbeName), >> bg = x[[1]]$bg, >> norm = x[[1]]$norm)) >> } >> >> #generate contrasts >> m_fit<- function(x, design){ >> contmat<- makeContrasts(s2-s1, >> levels = design) >> fit1<- lmFit(x[[1]]$agi$E, design) >> fit2<- eBayes(contrasts.fit(fit1, contmat)) >> diff<- topTable(fit2, coef = 1, adjust = "BH") >> res<- decideTests(fit2) >> comp<- vennDiagram(res) >> return(list(agi = x[[1]]$agi, >> bg = x[[1]]$bg, >> norm = x[[1]]$norm, >> ave = x[[1]]$ave, >> cont = contmat, >> lm = fit1, >> bayes = fit2, >> diff = diff, >> res = res, >> comp = comp)) >> } >> >> ###################################################################### >> ######### # Function to parse array data into limma objects # Args: >> # tfile: target file (see limma documentation) >> # tsep: limma defaults TSV, have to set manually for CSV and so on >> # galfile: GAL file for printer/layout >> # Returns data object with targets, genes and printer layout >> ###################################################################### >> ######### >> parse<- function(tfile, tsep, galfile){ >> targets<- readTargets(tfile, sep=tsep) >> gal<- readGAL(galfile) >> layout<- getLayout(gal) >> agi<- read.maimages(targets, columns = COL__, >> annotation = ANO__, >> green.only = TRUE) >> bg<- "none" >> norm<- "none" >> desg<- model.matrix(~0+factor(c(1,1,2,2))) >> colnames(desg)<- c("s1","s2") >> d<- list(agi = agi, >> gal = gal, >> layout = layout, >> targets = targets, >> bg = bg, >> desg = desg, >> norm = norm) >> >> processed<- matrix(NA, nrow = N_BGM__, ncol = N_NRM__) >> processed<- apply(processed, 1:2, function(x){return(d)}) >> return(processed) >> } >> >> #parse command args >> args<- commandArgs(trailingOnly = TRUE) >> >> #load files into matrix >> a<- parse(args[1], args[2], args[3]) >> >> #keepy a copy of a raw dataset >> raw<- a[1,1] >> print(raw[[1]]$desg) >> >> #perform backgroung corrections >> a[,A_SUB__]<- lapply(a[,A_SUB__], bg_subtract) >> a[,A_NXP__]<- lapply(a[,A_NXP__], bg_normexp) >> a[,A_MIN__]<- lapply(a[,A_MIN__], bg_minimum) >> >> #perform normalization >> a[A_VSN__,]<- lapply(a[A_VSN__,], nb_vsn) >> a[A_LWS__,]<- lapply(a[A_LWS__,], nb_lowess) >> a[A_QNT__,]<- lapply(a[A_QNT__,], nb_quantile) >> >> #fit model and differentiate expression >> a<- apply(a, 1:2, m_repavg) >> a<- apply(a, 1:2, m_fit, design = raw[[1]]$desg) >> >> #generate plots >> apply(a, 1:2, pl_ma) >> apply(a, 1:2, pl_meansd) >> apply(a, 1:2, pl_volcano) >> apply(a, 1:2, pl_lines) >> apply(a, 1:2, pl_heatmap, args[4]) >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 9 Date: Thu, 28 Mar 2013 11:00:17 -0500 From: Michael Muratet <mmuratet@hudsonalpha.org> To: bioconductor at r-project.org Subject: [BioC] Interpreting DESeq2 results Message-ID: <14DFA994-3719-4FDF-BD71-4290D4149457 at hudsonalpha.org> Content-Type: text/plain; charset=us-ascii Greetings I have an experiment: > design(dse) ~ factor1 + factor2 + factor3 where factor1 has two levels, factor2 has three levels and factor3 has three levels. I extract a gene of interest from the results for each term (I've changed the indices to reflect the condition): > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",]) [["Intercept"]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216 [["factor1_level2"]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 0.3386776 0.1307309 0.3587438 [["factor2_level2"]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 -0.6882543 0.0613569 0.1007896 [["factor2_level3"]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 0.2393368 0.513216 0.6589575 [["factor3_level2"]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 0.1584153 0.6423634 0.8503163 [["factor3_level3]] baseMean log2FoldChange pvalue FDR gene_A 1596.548 -1.627898 1.823141e-06 0.001409384 I want to be sure I understand the output format. Is it true that the coefficients (the vector beta) from the fit are the baseMean value scaled by the log2FoldChange? Is the true intercept value 1596.548*2^10.77485=2797274.13? mcols() tells me that the baseMean term is calculated over "all rows". The baseMean is different for different genes although it is the same for each gene across all the conditions, I'm not seeing how the rows are selected. Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmuratet at hudsonalpha.org (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 ------------------------------ Message: 10 Date: Thu, 28 Mar 2013 16:30:12 +0000 From: Fiona Ingleby <fiona.ingleby@gmail.com> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: [BioC] identifying drosophila miRNA targets Message-ID: <04CBBC25-9E0F-4E2B-8008-D3EB093E2A10 at gmail.com> Content-Type: text/plain Hi everyone, I am working with mRNA data from Affy 'drosophila2' arrays and miRNA data from Affy 'mirna3' arrays. I have identified a list of differentially expressed mRNAs and miRNAs. I'm having a bit of trouble with some downstream analyses and I'm hoping someone might be able to offer some help. I would like to use my list of differentially expressed miRNAs to access online databases (e.g. miRBase, microRNA.org?) and extract the names of all the potential target mRNAs. Then I'd like to use this list of mRNAs to look through my mRNA expression data. I'm aware of packages like 'RmiR' and 'microRNA' which have built-in functions for finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA databases for humans only and 'microRNA' works with human and mouse data only. So is there a package I am unaware of (or another application of 'RmiR'/'microRNA' that I am unaware of) for looking at drosophila data? So far I have also considered the 'biomaRt' package to see if the database query function on there can help me, but I haven't had much luck. For instance, if I try an example list of miRNAs: mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973") library(biomaRt) ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl") getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna ,mart=ensembl) then 'logical(0)' is returned, as if there are no records for those miRNAs - but by searching the database manually I know the records are there. Alternatively I can try: miRNA <- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(T), mart = ensembl) which returns a table of various bits of information on miRNAs, but I cannot adapt this command to just look at my list of miRNAs of interest (ie. the 'mirna' vector above). I've included the sessionInfo() output for these at the bottom of the email, but I suspect my problem is more to do with the fact I'm not going about this the right way (as opposed to a problem with package versions and coding etc.). I'm not even sure that using 'biomaRt' will give me the information I eventually want (the target mRNAs of these miRNAs), I was just trying it out, to see what it was capable of in terms of querying these databases. So I apologise for the vagueness. Since I haven't managed to get very far by myself then it's difficult to be more specific, but I'd really appreciate it if anyone could offer some advice, even just to point me in the direction of a useful package which might have gone unnoticed by me. Many thanks, Fiona Dr Fiona C Ingleby Postdoctoral Research Fellow University of Sussex Email: F.Ingleby at sussex.ac.uk Website: fionaingleby.weebly.com > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.14.0 affy_1.36.1 Biobase_2.18.0 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] affyio_1.26.0 BiocInstaller_1.8.3 grid_2.15.2 lattice_0.20-14 Matrix_1.0-11 MCMCglmm_2.17 [7] preprocessCore_1.20.0 RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2 zlibbioc_1.4.0 [[alternative HTML version deleted]] ------------------------------ Message: 11 Date: Thu, 28 Mar 2013 12:43:14 -0400 From: "James W. MacDonald" <jmacdon@uw.edu> To: Fiona Ingleby <fiona.ingleby at="" gmail.com=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] identifying drosophila miRNA targets Message-ID: <51547322.6020803 at uw.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Hi Fiona, I have a function called mirna2mrna (yeah, I know, lame function name...) in my affycoretools package that does this, based on the sanger microcosm targets data that you can download here: http://www.ebi.ac.uk/enright-srv/microcosm/cgi- bin/targets/v5/download.pl there is also a function makeHmap() that will create a heatmap with the miRNA/mRNA pairs, where the color of the cells is based on the correlation between the two RNA species (with the intent to show negative correlations, indicating that the miRNA is hypothetically causing premature degradation of the mRNA). I think the help pages for these two functions are reasonable, but please let me know if you have any questions. Best, Jim On 3/28/2013 12:30 PM, Fiona Ingleby wrote: > Hi everyone, > > I am working with mRNA data from Affy 'drosophila2' arrays and miRNA data from Affy 'mirna3' arrays. I have identified a list of differentially expressed mRNAs and miRNAs. I'm having a bit of trouble with some downstream analyses and I'm hoping someone might be able to offer some help. > > I would like to use my list of differentially expressed miRNAs to access online databases (e.g. miRBase, microRNA.org?) and extract the names of all the potential target mRNAs. Then I'd like to use this list of mRNAs to look through my mRNA expression data. I'm aware of packages like 'RmiR' and 'microRNA' which have built-in functions for finding miRNA targets, but as far as I can tell, 'RmiR' uses miRNA databases for humans only and 'microRNA' works with human and mouse data only. So is there a package I am unaware of (or another application of 'RmiR'/'microRNA' that I am unaware of) for looking at drosophila data? > > So far I have also considered the 'biomaRt' package to see if the database query function on there can help me, but I haven't had much luck. For instance, if I try an example list of miRNAs: > > mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973") > library(biomaRt) > ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl") > getBM(attributes="mirbase_accession",filters="mirbase_id",values=mir na,mart=ensembl) > > then 'logical(0)' is returned, as if there are no records for those miRNAs - but by searching the database manually I know the records are there. > > Alternatively I can try: > > miRNA<- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id", "start_position", "chromosome_name"), filters = c("with_mirbase"), values = list(T), mart = ensembl) > > which returns a table of various bits of information on miRNAs, but I cannot adapt this command to just look at my list of miRNAs of interest (ie. the 'mirna' vector above). I've included the sessionInfo() output for these at the bottom of the email, but I suspect my problem is more to do with the fact I'm not going about this the right way (as opposed to a problem with package versions and coding etc.). I'm not even sure that using 'biomaRt' will give me the information I eventually want (the target mRNAs of these miRNAs), I was just trying it out, to see what it was capable of in terms of querying these databases. So I apologise for the vagueness. Since I haven't managed to get very far by myself then it's difficult to be more specific, but I'd really appreciate it if anyone could offer some advice, even just to point me in the direction of a useful package which might have gone unnoticed by me. > > Many thanks, > > Fiona > > Dr Fiona C Ingleby > Postdoctoral Research Fellow > University of Sussex > Email: F.Ingleby at sussex.ac.uk > Website: fionaingleby.weebly.com > > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.14.0 affy_1.36.1 Biobase_2.18.0 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] affyio_1.26.0 BiocInstaller_1.8.3 grid_2.15.2 lattice_0.20-14 Matrix_1.0-11 MCMCglmm_2.17 > [7] preprocessCore_1.20.0 RCurl_1.95-4.1 tools_2.15.2 XML_3.95-0.2 zlibbioc_1.4.0 > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ------------------------------ Message: 12 Date: Thu, 28 Mar 2013 16:58:21 +0000 From: "Blanchette, Marco" <mab@stowers.org> To: Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Limit on number of sequence files for forging a BSgenome Message-ID: <1CE7C98DF956594C98E0956F4747178136D97D at MBSRV02.sgc.loc> Content-Type: text/plain; charset="us-ascii" Kasper, I see your line of thought, is there a particular fasta file causing forgeBSgenomeDataPkg() to break? The answer is no. Once I reach a certain number of fasta files, adding one more contig breaks the function. For instance, taking the first 454 contigs of C. brenneri breaks while removing the last or the first fasta file from the list (keeping only 453) compile without a problem (neither the last or the first fasta files are responsible for breaking the function, the number of file is the trigger) What's even more puzzling is that the number that breaks is not a fixed number. Selecting a random selection of contigs or changing genome will change the number that triggers the function to break... However it's always around 440 files, which might be due to the size of the fasta files being all of very similar sizes. Any clues? -- Marco Blanchette, Ph.D. Stowers Institute for Medical Research 1000 East 50th Street Kansas City MO 64110 www.stowers.org Tel: 816-926-4071 Cell: 816-726-8419 Fax: 816-926-2018 On 3/27/13 8:22 PM, "Kasper Daniel Hansen" <kasperdanielhansen at="" gmail.com=""> wrote: >Marco, > >You are probably right in diagnosing the problem, but sometimes I >think I have seen FASTA files with the entire sequence on a single >line, instead of (say) 80 nucleotides and then a newline. I could >believe that a really long contig on a single line without a newline, >could cause an error like this. You could quickly check if there is a >suspicious file by > wc -l * >and look for files with #lines like 2-3. Somehow 460 seems a weird >number to fail at. > >This may not be your problem, and I am sure Herve will respond in due >time. > >Best, >Kasper > >On Wed, Mar 27, 2013 at 4:28 PM, Blanchette, Marco <mab at="" stowers.org=""> >wrote: >> Hi, >> >> Is there a maximum number of sequence files (chromosomes or contigs in >>my case) that can be fed to the forgeBSgenomeDataPkg() function? I am >>trying to build a BSgenome for C. brenneri and C. japonica available >>from EnsemblGenomes. These genomes are made from thousands of contigs >>with genes annotated to them. Currently, I get the following error when >>running "Error: Line longer than buffer size" when running on the full >>set of contigs. However, it works fine on a seed file containing a >>subset of the contigs (I can forge a genome with 450 contigs but not >>with 460!) >> >> Any suggestions will be appreciated (I can provide a toy example but I >>am not sure what would be the merit of it at this point) >> >> Thanks >> >> -- Marco Blanchette, Ph.D. >> Stowers Institute for Medical Research >> 1000 East 50th Street >> Kansas City MO 64110 >> www.stowers.org >> >> Tel: 816-926-4071 >> Cell: 816-726-8419 >> Fax: 816-926-2018 >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Message: 13 Date: Thu, 28 Mar 2013 19:07:19 +0000 From: "Shields, Rusty (IMS)" <shieldsr@imsweb.com> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error when installing globaltest package Message-ID: <e2e839b5bfa711438b481ad9906b0c340a6a70a8 at="" varuna.omni.imsweb.com=""> Content-Type: text/plain; charset="iso-8859-1" Thanks Martin, No apologies necessary, just happy to have some help. Here's the info on Biobase: > packageDescription("Biobase") Package: Biobase Title: Biobase: Base functions for Bioconductor Version: 2.18.0 Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon Description: Functions that are needed by many other packages or which replace R functions. Suggests: tools, tkWidgets, ALL Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils Imports: methods, BiocGenerics Maintainer: Bioconductor Package Maintainer <maintainer at="" bioconductor.org=""> License: Artistic-2.0 Collate: tools.R strings.R environment.R vignettes.R packages.R AllGenerics.R VersionsClass.R VersionedClasses.R methods-VersionsNull.R methods-VersionedClass.R DataClasses.R methods-aggregator.R methods-container.R methods-MIAxE.R methods-MIAME.R methods-AssayData.R methods-AnnotatedDataFrame.R methods-eSet.R methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R methods-NChannelSet.R anyMissing.R rowOp-methods.R updateObjectTo.R methods-ScalarObject.R zzz.R LazyLoad: yes biocViews: Infrastructure, Bioinformatics Packaged: 2012-10-02 02:41:50 UTC; biocbuild Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC; unix -- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Thursday, March 28, 2013 2:52 PM To: Shields, Rusty (IMS) Cc: r-help at r-project.org Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote: > Hi All, > > I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help. > > I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue. This error comes up when trying to install the bioconductor globaltest package. > Sorry that you didn't get a response on the Bioc mailing list; I'd actually suggest returning to the original thread there and I'll see that it gets answered. My guess is that you have a version of Biobase that is too new compared to the version (2.14.0) expected in the version of Bioconductor you are using. What does packageDescription("Biobase") say? Martin > Any clues? > > Thanks! > Rusty > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Shields, Rusty (IMS) > Sent: Tuesday, March 26, 2013 12:34 PM > To: bioconductor at r-project.org > Subject: [BioC] Error when installing globaltest package > > Hi all, > > I've run into a problem when attempting to install the globaltest package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let me know what other information you might need about my system to troubleshoot this. > > Using the method described for this installation on the Bioconductor website: > > source("http://bioconductor.org/biocLite.R") > biocLite("globaltest") > > I get and the following result, which I can't find an reference to on the list archives: > >> biocLite("globaltest") > BioC_mirror: 'http://www.bioconductor.org' > Using R version 2.14, BiocInstaller version 1.2.1. > Installing package(s) 'globaltest' > trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contri b/globaltest_5.8.1.tar.gz' > Content type 'application/x-gzip' length 956376 bytes (933 Kb) > opened URL > ================================================== > downloaded 933 Kb > > * installing *source* package ?globaltest? ... > ** R > ** data > ** inst > ** preparing package for lazy loading > Warning in .simpleDuplicateClass(def, prev) : > A specification for class ?data.frameOrNULL? > Creating a generic function for ?sort? from package ?base? in package ?globaltest? > Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest? > Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest? > Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest? > Creating a generic function for ?residuals? from package ?stats? in package ?globaltest? > Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"), : > no existing definition for function ?combine? > Error : unable to load R code in package ?globaltest? > ERROR: lazy loading failed for package ?globaltest? > * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest? > > The downloaded packages are in > ?/tmp/RtmpdCwjNB/downloaded_packages? > Updating HTML index of packages in '.Library' > Making packages.html ... done > Old packages: 'caret' > Update all/some/none? [a/s/n]: > > ________________________________ > > Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. > > [[alternative HTML version deleted]] > > > ________________________________ > > Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dr. Martin Morgan, PhD Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 ________________________________ Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. ------------------------------ Message: 14 Date: Thu, 28 Mar 2013 12:26:11 -0700 From: Martin Morgan <mtmorgan@fhcrc.org> To: "Shields, Rusty (IMS)" <shieldsr at="" imsweb.com=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error when installing globaltest package Message-ID: <51549953.6080003 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote: > Thanks Martin, > > No apologies necessary, just happy to have some help. > > Here's the info on Biobase: > >> packageDescription("Biobase") > Package: Biobase > Title: Biobase: Base functions for Bioconductor > Version: 2.18.0 yes, this is the version of Biobase from the current version of R / Bioconductor, whereas you're using a relatively old R / Biocondcutor (R 2.14, Bioc 2.9, from the 'Previous versions' box at http://bioconductor.org/help/) where Biobase is at version 2.14. This doesn't bode well, as other packages are also likely at incorrect versions. The easiest solution is to upgrade to a recent R, and start again with biocLite('globaltest'). You could also try biocLite("Biobase") to get the correct version, then biocLite("globaltest"), but if other packages are also installed incorrectly... Something like source("http://bioconductor.org/biocLite.R") avail = available.packages(contriburl=contrib.url(biocinstallRepos())) inst = installed.packages(priority="NA") idx = rownames(avail) %in% rownames(inst) vers = avail[idx, "Version"] vers[vers < inst[names(vers), "Version"]] will give you the too-new packages that need to be re-installed. It's hard to know how your system got to its current state; one candidate is that .libPaths() includes a path shared by several versions of R. but using biocLite() to install packages should help avoiding that in the future. Martin > Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon > Description: Functions that are needed by many other packages or which > replace R functions. > Suggests: tools, tkWidgets, ALL > Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils > Imports: methods, BiocGenerics > Maintainer: Bioconductor Package Maintainer > <maintainer at="" bioconductor.org=""> > License: Artistic-2.0 > Collate: tools.R strings.R environment.R vignettes.R packages.R > AllGenerics.R VersionsClass.R VersionedClasses.R > methods-VersionsNull.R methods-VersionedClass.R DataClasses.R > methods-aggregator.R methods-container.R methods-MIAxE.R > methods-MIAME.R methods-AssayData.R > methods-AnnotatedDataFrame.R methods-eSet.R > methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R > methods-NChannelSet.R anyMissing.R rowOp-methods.R > updateObjectTo.R methods-ScalarObject.R zzz.R > LazyLoad: yes > biocViews: Infrastructure, Bioinformatics > Packaged: 2012-10-02 02:41:50 UTC; biocbuild > Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC; > unix > > -- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds > > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Sent: Thursday, March 28, 2013 2:52 PM > To: Shields, Rusty (IMS) > Cc: r-help at r-project.org > Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package > > On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote: >> Hi All, >> >> I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help. >> >> I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue. This error comes up when trying to install the bioconductor globaltest package. >> > > Sorry that you didn't get a response on the Bioc mailing list; I'd actually > suggest returning to the original thread there and I'll see that it gets answered. > > My guess is that you have a version of Biobase that is too new compared to the > version (2.14.0) expected in the version of Bioconductor you are using. What > does packageDescription("Biobase") say? > > Martin > > > > >> Any clues? >> >> Thanks! >> Rusty >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Shields, Rusty (IMS) >> Sent: Tuesday, March 26, 2013 12:34 PM >> To: bioconductor at r-project.org >> Subject: [BioC] Error when installing globaltest package >> >> Hi all, >> >> I've run into a problem when attempting to install the globaltest package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let me know what other information you might need about my system to troubleshoot this. >> >> Using the method described for this installation on the Bioconductor website: >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("globaltest") >> >> I get and the following result, which I can't find an reference to on the list archives: >> >>> biocLite("globaltest") >> BioC_mirror: 'http://www.bioconductor.org' >> Using R version 2.14, BiocInstaller version 1.2.1. >> Installing package(s) 'globaltest' >> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contr ib/globaltest_5.8.1.tar.gz' >> Content type 'application/x-gzip' length 956376 bytes (933 Kb) >> opened URL >> ================================================== >> downloaded 933 Kb >> >> * installing *source* package ?globaltest? ... >> ** R >> ** data >> ** inst >> ** preparing package for lazy loading >> Warning in .simpleDuplicateClass(def, prev) : >> A specification for class ?data.frameOrNULL? >> Creating a generic function for ?sort? from package ?base? in package ?globaltest? >> Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest? >> Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest? >> Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest? >> Creating a generic function for ?residuals? from package ?stats? in package ?globaltest? >> Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"), : >> no existing definition for function ?combine? >> Error : unable to load R code in package ?globaltest? >> ERROR: lazy loading failed for package ?globaltest? >> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest? >> >> The downloaded packages are in >> ?/tmp/RtmpdCwjNB/downloaded_packages? >> Updating HTML index of packages in '.Library' >> Making packages.html ... done >> Old packages: 'caret' >> Update all/some/none? [a/s/n]: >> >> ________________________________ >> >> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. >> >> [[alternative HTML version deleted]] >> >> >> ________________________________ >> >> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Dr. Martin Morgan, PhD > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > ________________________________ > > Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ------------------------------ Message: 15 Date: Thu, 28 Mar 2013 20:00:24 +0000 From: "Shields, Rusty (IMS)" <shieldsr@imsweb.com> To: Martin Morgan <mtmorgan at="" fhcrc.org=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error when installing globaltest package Message-ID: <e2e839b5bfa711438b481ad9906b0c340a6a78ad at="" varuna.omni.imsweb.com=""> Content-Type: text/plain; charset="iso-8859-1" Correcting the version of Biobase seems to have fixed it, and after that it doesn't appear that anything else is of the wrong version - based on the code that you provided below. globaltest now installs successfully through BiocLite. As for how the wrong version of Biobase was installed, I have to take the blame for that. Rather than following the instructions on the bioconductor website to install using BiocLite, I made the mistake of downloading the package source manually and installing via 'R CMD INSTALL ...'. I believe I used the biocLite method to install all of the other packages. It has been a very educational few days. Thanks again. -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Thursday, March 28, 2013 3:26 PM To: Shields, Rusty (IMS) Cc: bioconductor at r-project.org Subject: Re: [BioC] [R] Error in setMethod("combine"... was - Error when installing globaltest package On 03/28/2013 12:07 PM, Shields, Rusty (IMS) wrote: > Thanks Martin, > > No apologies necessary, just happy to have some help. > > Here's the info on Biobase: > >> packageDescription("Biobase") > Package: Biobase > Title: Biobase: Base functions for Bioconductor > Version: 2.18.0 yes, this is the version of Biobase from the current version of R / Bioconductor, whereas you're using a relatively old R / Biocondcutor (R 2.14, Bioc 2.9, from the 'Previous versions' box at http://bioconductor.org/help/) where Biobase is at version 2.14. This doesn't bode well, as other packages are also likely at incorrect versions. The easiest solution is to upgrade to a recent R, and start again with biocLite('globaltest'). You could also try biocLite("Biobase") to get the correct version, then biocLite("globaltest"), but if other packages are also installed incorrectly... Something like source("http://bioconductor.org/biocLite.R") avail = available.packages(contriburl=contrib.url(biocinstallRepos())) inst = installed.packages(priority="NA") idx = rownames(avail) %in% rownames(inst) vers = avail[idx, "Version"] vers[vers < inst[names(vers), "Version"]] will give you the too-new packages that need to be re-installed. It's hard to know how your system got to its current state; one candidate is that .libPaths() includes a path shared by several versions of R. but using biocLite() to install packages should help avoiding that in the future. Martin > Author: R. Gentleman, V. Carey, M. Morgan, S. Falcon > Description: Functions that are needed by many other packages or which > replace R functions. > Suggests: tools, tkWidgets, ALL > Depends: R (>= 2.10), BiocGenerics (>= 0.3.2), utils > Imports: methods, BiocGenerics > Maintainer: Bioconductor Package Maintainer > <maintainer at="" bioconductor.org=""> > License: Artistic-2.0 > Collate: tools.R strings.R environment.R vignettes.R packages.R > AllGenerics.R VersionsClass.R VersionedClasses.R > methods-VersionsNull.R methods-VersionedClass.R DataClasses.R > methods-aggregator.R methods-container.R methods-MIAxE.R > methods-MIAME.R methods-AssayData.R > methods-AnnotatedDataFrame.R methods-eSet.R > methods-ExpressionSet.R methods-MultiSet.R methods-SnpSet.R > methods-NChannelSet.R anyMissing.R rowOp-methods.R > updateObjectTo.R methods-ScalarObject.R zzz.R > LazyLoad: yes > biocViews: Infrastructure, Bioinformatics > Packaged: 2012-10-02 02:41:50 UTC; biocbuild > Built: R 2.14.0; x86_64-unknown-linux-gnu; 2013-03-22 15:01:20 UTC; > unix > > -- File: /usr/local/R-2.14.0/lib64/R/library/Biobase/Meta/package.rds > > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan at fhcrc.org] > Sent: Thursday, March 28, 2013 2:52 PM > To: Shields, Rusty (IMS) > Cc: r-help at r-project.org > Subject: Re: [R] Error in setMethod("combine"... was - Error when installing globaltest package > > On 3/28/2013 11:05 AM, Shields, Rusty (IMS) wrote: >> Hi All, >> >> I posted this on the bioconductor list and didn't get a response there, so I'm hoping someone here can help. >> >> I don't know a heck of a lot about R, so I apologize if this seems like a trivial issue. This error comes up when trying to install the bioconductor globaltest package. >> > > Sorry that you didn't get a response on the Bioc mailing list; I'd actually > suggest returning to the original thread there and I'll see that it gets answered. > > My guess is that you have a version of Biobase that is too new compared to the > version (2.14.0) expected in the version of Bioconductor you are using. What > does packageDescription("Biobase") say? > > Martin > > > > >> Any clues? >> >> Thanks! >> Rusty >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Shields, Rusty (IMS) >> Sent: Tuesday, March 26, 2013 12:34 PM >> To: bioconductor at r-project.org >> Subject: [BioC] Error when installing globaltest package >> >> Hi all, >> >> I've run into a problem when attempting to install the globaltest package from Bioconductor. I'm using R 2.14.0 on 64bit SLES 11. Let me know what other information you might need about my system to troubleshoot this. >> >> Using the method described for this installation on the Bioconductor website: >> >> source("http://bioconductor.org/biocLite.R") >> biocLite("globaltest") >> >> I get and the following result, which I can't find an reference to on the list archives: >> >>> biocLite("globaltest") >> BioC_mirror: 'http://www.bioconductor.org' >> Using R version 2.14, BiocInstaller version 1.2.1. >> Installing package(s) 'globaltest' >> trying URL 'http://www.bioconductor.org/packages/2.9/bioc/src/contr ib/globaltest_5.8.1.tar.gz' >> Content type 'application/x-gzip' length 956376 bytes (933 Kb) >> opened URL >> ================================================== >> downloaded 933 Kb >> >> * installing *source* package ?globaltest? ... >> ** R >> ** data >> ** inst >> ** preparing package for lazy loading >> Warning in .simpleDuplicateClass(def, prev) : >> A specification for class ?data.frameOrNULL? >> Creating a generic function for ?sort? from package ?base? in package ?globaltest? >> Creating a generic function for ?model.matrix? from package ?stats? in package ?globaltest? >> Creating a generic function for ?coefficients? from package ?stats? in package ?globaltest? >> Creating a generic function for ?fitted.values? from package ?stats? in package ?globaltest? >> Creating a generic function for ?residuals? from package ?stats? in package ?globaltest? >> Error in setMethod("combine", signature(x = "gt.result", y = "gt.result"), : >> no existing definition for function ?combine? >> Error : unable to load R code in package ?globaltest? >> ERROR: lazy loading failed for package ?globaltest? >> * removing ?/usr/local/R-2.14.0/lib64/R/library/globaltest? >> >> The downloaded packages are in >> ?/tmp/RtmpdCwjNB/downloaded_packages? >> Updating HTML index of packages in '.Library' >> Making packages.html ... done >> Old packages: 'caret' >> Update all/some/none? [a/s/n]: >> >> ________________________________ >> >> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. >> >> [[alternative HTML version deleted]] >> >> >> ________________________________ >> >> Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Dr. Martin Morgan, PhD > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > ________________________________ > > Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ________________________________ Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error. ------------------------------ Message: 16 Date: Thu, 28 Mar 2013 22:19:57 +0100 From: Michael Love <michaelisaiahlove@gmail.com> To: Michael Muratet <mmuratet at="" hudsonalpha.org=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] Interpreting DESeq2 results Message-ID: <cadqzidv-qp+=nm_goxj2aso+2maevq_i_y0mfgu1vv3o8fzy1g at="" mail.gmail.com=""> Content-Type: text/plain Hi Michael, The baseMean column is not on the log scale; it is the mean of normalized counts for a gene. The intercept from the GLM is labelled intercept in mcols(dse). Mike On Mar 28, 2013 5:00 PM, "Michael Muratet" <mmuratet at="" hudsonalpha.org=""> wrote: > Greetings > > I have an experiment: > > > design(dse) > ~ factor1 + factor2 + factor3 > > where factor1 has two levels, factor2 has three levels and factor3 has > three levels. I extract a gene of interest from the results for each term > (I've changed the indices to reflect the condition): > > > lapply(resultsNames(dse),function(u) results(dse,u)["gene_A",]) > [["Intercept"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 10.77485 3.309439e-216 7.025442e-216 > [["factor1_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.3386776 0.1307309 0.3587438 > [["factor2_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 -0.6882543 0.0613569 0.1007896 > [["factor2_level3"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.2393368 0.513216 0.6589575 > [["factor3_level2"]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 0.1584153 0.6423634 0.8503163 > [["factor3_level3]] > baseMean log2FoldChange pvalue FDR > gene_A 1596.548 -1.627898 1.823141e-06 0.001409384 > > I want to be sure I understand the output format. Is it true that the > coefficients (the vector beta) from the fit are the baseMean value scaled > by the log2FoldChange? Is the true intercept value > 1596.548*2^10.77485=2797274.13? > > mcols() tells me that the baseMean term is calculated over "all rows". The > baseMean is different for different genes although it is the same for each > gene across all the conditions, I'm not seeing how the rows are selected. > > Thanks > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmuratet at hudsonalpha.org > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] ------------------------------ Message: 17 Date: Thu, 28 Mar 2013 17:50:04 -0400 From: somnath bandyopadhyay <genome1976@hotmail.com> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Interpreting DESeq2 results Message-ID: <bay175-w24d794c2988d81c509ef9fcdd20 at="" phx.gbl=""> Content-Type: text/plain Will the LIMMA package work on sparse matrices? Thanks,Som. [[alternative HTML version deleted]] ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 121, Issue 29

miRNA Annotation Normalization Visualization Clustering Infrastructure Cancer BSgenome vsn • 1.8k views

ADD COMMENT • link 12.0 years ago Matthew Thornton ▴ 380