ReportingTools - trouble incorporating annotations

0

Entering edit mode

Sam McInturf ▴ 300

@sam-mcinturf-5291

Last seen 9.5 years ago

United States

Bioconductors, I am working on a RNA seq analysis project and am having trouble publishing an HTML report for it. I am unsure of how to make my DE genes have the same ID as what publish() will accept when passing an argument to 'annotation'. I mapped the reads using tophat and passed the TAIR 10 gtf file to the -G option. When i counted my reads I used the summarizeOverlaps function from GenomicRanges and again used this same file. I called differential expression in edgeR using the GLM methods. So the rownames of my DE table are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db annotations and passed it to HTMLReport in: publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, conditions = group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc =2, n = 1000, name = "RootsLRT") Error: More than half of your IDs could not be mapped. In addition: Warning message: In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion which makes sense, because publish() is looking for Entrez IDs (right?) How do I proceed? Thanks in advance! -- Sam McInturf [[alternative HTML version deleted]]

Annotation edgeR GenomicRanges Annotation edgeR GenomicRanges • 1.7k views

ADD COMMENT • link updated 11.8 years ago by James W. MacDonald 68k • written 11.8 years ago by Sam McInturf ▴ 300

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 9 hours ago

United States

Hi Sam, First, please always give us the results of sessionInfo(). This is especially critical in the case of ReportingTools, which has been fundamentally altered between the previous and current versions of BioC. On 6/19/2013 11:12 AM, Sam McInturf wrote: > Bioconductors, > I am working on a RNA seq analysis project and am having trouble > publishing an HTML report for it. I am unsure of how to make my DE genes > have the same ID as what publish() will accept when passing an argument to > 'annotation'. > I mapped the reads using tophat and passed the TAIR 10 gtf file to the > -G option. When i counted my reads I used the summarizeOverlaps function > from GenomicRanges and again used this same file. I called differential > expression in edgeR using the GLM methods. So the rownames of my DE table > are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db > annotations and passed it to HTMLReport in: > > publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, conditions = > group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc =2, n = 1000, > name = "RootsLRT") > Error: More than half of your IDs could not be mapped. > In addition: Warning message: > In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion > > which makes sense, because publish() is looking for Entrez IDs (right?) > > How do I proceed? Here I assume you are running R-3.0.x and the current release of BioC. When you run publish() on anything but a data.frame, the first step is to coerce to a data.frame using a set of assumptions that might not hold in your case (or there may be defaults that you don't like). Because of this, I tend to just coerce to a data.frame myself and then publish() that directly. This also allows you to pass in arguments to .modifyDF which is kind of sweet. In the case of a DGELRT or DEGExact object, there is a 'genes' slot that will be used to annotate the output of topTags(). Ideally you would just add the annotation you want to that slot first. So you could do something like annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<tair column="" goes="" here="">], c("SYMBOL","GENENAME","OTHERSTUFF")) and then put that in your DGEobjects. Now you can do something like outlst <- lapply(DGELists, topTags, otherargsgohere) htmlst <- lapply(seq_len(length(DGELists)) function(x) HTMLReport(namevector[x], titlevector[x], otherargs)) and you can define a function similar to this function I use for Entrez Gene IDs: entrezLinks <- function (df, ...){ df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID), link = paste0("http://www.ncbi.nlm.nih.gov/gene/", as.character(df$ENTREZID)), table = FALSE) return(df) } but modified for the Tair equivalent and then lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]], htmlst[[x]], .modifyDF = samsTairLinkFun))) lapply(htmlst, finish) et voila! You can also then use htmlst to make a bunch of links in an index.html page. indx <- HTMLReport("index", "A bunch of links for this expt", reportDirectory=".", baseUrl = "") publish(hwriter::hwrite("Here are links", page(indx), header=2, br=TRUE), indx) publish(Link(htmlst, report=indx), indx) finish(indx) Best, Jim > > Thanks in advance! -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.8 years ago James W. MacDonald 68k

0

Entering edit mode

Jim, When I look at my DGELRT objects I don't have a $genes field: (to be verbose, DGELists is a list of DGELRT objects (list(glmLRT(), glmLRT(),...)) > DGELists$Roots$genes NULL Looking at what fields I do have: coefficients, fitted.values, deviance, df.residual, design, offset, dispersion, method, samples, AveLogCPM, table, comparison, df.test Looking at the DGEGLM which my DGELists were made, I have no gene slot either. I added the rownames of the counts table in the DGEGLM to a new $genes: >fit$genes <- rownames(fit$counts) but the the $genes slot did not get passed onto the DGELRTs. Is it best just to add the $genes field to each DGELRT and then follow what you stated before? Thanks, > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 [10] edgeR_3.2.3 limma_3.16.5 BiocInstaller_1.10.2 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationForge_1.2.1 biomaRt_2.16.0 [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 [7] BSgenome_1.28.0 Category_2.26.0 cluster_1.14.4 [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 [16] ggplot2_0.9.3.1 GO.db_2.9.0 GOstats_2.26.0 [19] graph_1.38.2 grid_3.0.1 gridExtra_0.9.1 [22] GSEABase_1.22.0 gtable_0.1.2 Hmisc_3.10-1.1 [25] hwriter_1.3 labeling_0.1 lattice_0.20-15 [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0 [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 [37] R.methodsS3_1.4.2 R.oo_1.13.0 Rsamtools_1.12.3 [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2 [46] survival_2.37-4 tools_3.0.1 VariantAnnotation_1.6.6 [49] XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Sam, > > First, please always give us the results of sessionInfo(). This is > especially critical in the case of ReportingTools, which has been > fundamentally altered between the previous and current versions of BioC. > > > On 6/19/2013 11:12 AM, Sam McInturf wrote: > >> Bioconductors, >> I am working on a RNA seq analysis project and am having trouble >> publishing an HTML report for it. I am unsure of how to make my DE genes >> have the same ID as what publish() will accept when passing an argument to >> 'annotation'. >> I mapped the reads using tophat and passed the TAIR 10 gtf file to >> the >> -G option. When i counted my reads I used the summarizeOverlaps function >> from GenomicRanges and again used this same file. I called differential >> expression in edgeR using the GLM methods. So the rownames of my DE >> table >> are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db >> annotations and passed it to HTMLReport in: >> >> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, conditions = >> group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc =2, n = >> 1000, >> name = "RootsLRT") >> Error: More than half of your IDs could not be mapped. >> In addition: Warning message: >> In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion >> >> which makes sense, because publish() is looking for Entrez IDs (right?) >> >> How do I proceed? >> > > Here I assume you are running R-3.0.x and the current release of BioC. > > When you run publish() on anything but a data.frame, the first step is to > coerce to a data.frame using a set of assumptions that might not hold in > your case (or there may be defaults that you don't like). Because of this, > I tend to just coerce to a data.frame myself and then publish() that > directly. This also allows you to pass in arguments to .modifyDF which is > kind of sweet. > > In the case of a DGELRT or DEGExact object, there is a 'genes' slot that > will be used to annotate the output of topTags(). Ideally you would just > add the annotation you want to that slot first. So you could do something > like > > annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<**Tair column > goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF")) > > and then put that in your DGEobjects. Now you can do something like > > outlst <- lapply(DGELists, topTags, otherargsgohere) > > htmlst <- lapply(seq_len(length(**DGELists)) function(x) > HTMLReport(namevector[x], titlevector[x], otherargs)) > > and you can define a function similar to this function I use for Entrez > Gene IDs: > > entrezLinks <- function (df, ...){ > df$ENTREZID <- hwriter::hwrite(as.character(**df$ENTREZID), > link = paste0("http://www.ncbi.nlm.**nih.gov/gene/<http: ww="" w.ncbi.nlm.nih.gov="" gene=""/>", > as.character(df$ENTREZID)), > table = FALSE) > return(df) > } > > but modified for the Tair equivalent and then > > lapply(seq_len(length(htmlst))**, function(x) publish(outlst[[x]], > htmlst[[x]], .modifyDF = samsTairLinkFun))) > lapply(htmlst, finish) > > et voila! > > You can also then use htmlst to make a bunch of links in an index.html > page. > > indx <- HTMLReport("index", "A bunch of links for this expt", > reportDirectory=".", baseUrl = "") > publish(hwriter::hwrite("Here are links", page(indx), header=2, br=TRUE), > indx) > publish(Link(htmlst, report=indx), indx) > finish(indx) > > Best, > > Jim > > > >> Thanks in advance! >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Sam McInturf [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Sam McInturf ▴ 300

0

Entering edit mode

Hi Sam, Sorry, my bad. One of the arguments for DGEList is 'genes', which is genes: data frame containing annotation information for the tags/transcripts/genes. So yeah, when you are creating your DGELists, add in the annotation data for each gene/transcript/whatever, and then go from there. Best, Jim On 6/19/2013 1:37 PM, Sam McInturf wrote: > Jim, > When I look at my DGELRT objects I don't have a $genes field: (to > be verbose, DGELists is a list of DGELRT objects (list(glmLRT(), > glmLRT(),...)) > > > DGELists$Roots$genes > NULL > > Looking at what fields I do have: coefficients, fitted.values, > deviance, df.residual, design, offset, dispersion, method, samples, > AveLogCPM, table, comparison, df.test > > Looking at the DGEGLM which my DGELists were made, I have no gene slot > either. I added the rownames of the counts table in the DGEGLM to a > new $genes: > > >fit$genes <- rownames(fit$counts) > > but the the $genes slot did not get passed onto the DGELRTs. Is it > best just to add the $genes field to each DGELRT and then follow what > you stated before? > > Thanks, > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 > [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 > [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 > [10] edgeR_3.2.3 limma_3.16.5 BiocInstaller_1.10.2 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationForge_1.2.1 biomaRt_2.16.0 > [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 > [7] BSgenome_1.28.0 Category_2.26.0 cluster_1.14.4 > [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 > [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 > [16] ggplot2_0.9.3.1 GO.db_2.9.0 GOstats_2.26.0 > [19] graph_1.38.2 grid_3.0.1 gridExtra_0.9.1 > [22] GSEABase_1.22.0 gtable_0.1.2 Hmisc_3.10-1.1 > [25] hwriter_1.3 labeling_0.1 lattice_0.20-15 > [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0 > [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 > [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 > [37] R.methodsS3_1.4.2 R.oo_1.13.0 Rsamtools_1.12.3 > [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 > [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2 > [46] survival_2.37-4 tools_3.0.1 > VariantAnnotation_1.6.6 > [49] XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 > > > > > On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Sam, > > First, please always give us the results of sessionInfo(). This is > especially critical in the case of ReportingTools, which has been > fundamentally altered between the previous and current versions of > BioC. > > > On 6/19/2013 11:12 AM, Sam McInturf wrote: > > Bioconductors, > I am working on a RNA seq analysis project and am having > trouble > publishing an HTML report for it. I am unsure of how to make > my DE genes > have the same ID as what publish() will accept when passing an > argument to > 'annotation'. > I mapped the reads using tophat and passed the TAIR 10 > gtf file to the > -G option. When i counted my reads I used the > summarizeOverlaps function > from GenomicRanges and again used this same file. I called > differential > expression in edgeR using the GLM methods. So the rownames > of my DE table > are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db > annotations and passed it to HTMLReport in: > > publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, > conditions = > group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc > =2, n = 1000, > name = "RootsLRT") > Error: More than half of your IDs could not be mapped. > In addition: Warning message: > In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion > > which makes sense, because publish() is looking for Entrez IDs > (right?) > > How do I proceed? > > > Here I assume you are running R-3.0.x and the current release of BioC. > > When you run publish() on anything but a data.frame, the first > step is to coerce to a data.frame using a set of assumptions that > might not hold in your case (or there may be defaults that you > don't like). Because of this, I tend to just coerce to a > data.frame myself and then publish() that directly. This also > allows you to pass in arguments to .modifyDF which is kind of sweet. > > In the case of a DGELRT or DEGExact object, there is a 'genes' > slot that will be used to annotate the output of topTags(). > Ideally you would just add the annotation you want to that slot > first. So you could do something like > > annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<tair> column goes here>], c("SYMBOL","GENENAME","OTHERSTUFF")) > > and then put that in your DGEobjects. Now you can do something like > > outlst <- lapply(DGELists, topTags, otherargsgohere) > > htmlst <- lapply(seq_len(length(DGELists)) function(x) > HTMLReport(namevector[x], titlevector[x], otherargs)) > > and you can define a function similar to this function I use for > Entrez Gene IDs: > > entrezLinks <- function (df, ...){ > df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID), > link = paste0("http://www.ncbi.nlm.nih.gov/gene/", > as.character(df$ENTREZID)), > table = FALSE) > return(df) > } > > but modified for the Tair equivalent and then > > lapply(seq_len(length(htmlst)), function(x) publish(outlst[[x]], > htmlst[[x]], .modifyDF = samsTairLinkFun))) > lapply(htmlst, finish) > > et voila! > > You can also then use htmlst to make a bunch of links in an > index.html page. > > indx <- HTMLReport("index", "A bunch of links for this expt", > reportDirectory=".", baseUrl = "") > publish(hwriter::hwrite("Here are links", page(indx), header=2, > br=TRUE), indx) > publish(Link(htmlst, report=indx), indx) > finish(indx) > > Best, > > Jim > > > > Thanks in advance! > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > > > -- > Sam McInturf -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.8 years ago James W. MacDonald 68k

0

Entering edit mode

James, Thanks for responding to Sam's question. One small point, you can control the coercion of the object to a data.frame within the publish call via the .toDF argument, which is applied directly before .modifyDF. The defaults of these steps (coercion and modification of data.frames) can also be extended on an S4 class basis, by creating new methods for toReportDF and modifyReportDF if you find yourself always using the same customization across multiple scripts (this approaches creating a package which depends on ReportingTools, however, and is likely overkill in most situations). There is a flowchart for the full publish mechanism in inst/examples/publishFlowchart.svg, which I am attaching here as well for convenience. ~G On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Sam, > > Sorry, my bad. One of the arguments for DGEList is 'genes', which is > > genes: data frame containing annotation information for the > tags/transcripts/genes. > > So yeah, when you are creating your DGELists, add in the annotation data > for each gene/transcript/whatever, and then go from there. > > Best, > > Jim > > > > On 6/19/2013 1:37 PM, Sam McInturf wrote: > >> Jim, >> When I look at my DGELRT objects I don't have a $genes field: (to be >> verbose, DGELists is a list of DGELRT objects (list(glmLRT(), glmLRT(),...)) >> >> > DGELists$Roots$genes >> NULL >> >> Looking at what fields I do have: coefficients, fitted.values, deviance, >> df.residual, design, offset, dispersion, method, samples, AveLogCPM, table, >> comparison, df.test >> >> Looking at the DGEGLM which my DGELists were made, I have no gene slot >> either. I added the rownames of the counts table in the DGEGLM to a new >> $genes: >> >> >fit$genes <- rownames(fit$counts) >> >> but the the $genes slot did not get passed onto the DGELRTs. Is it best >> just to add the $genes field to each DGELRT and then follow what you stated >> before? >> >> Thanks, >> >> > sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 >> [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 >> [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 >> [10] edgeR_3.2.3 limma_3.16.5 BiocInstaller_1.10.2 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.38.0 AnnotationForge_1.2.1 biomaRt_2.16.0 >> [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 >> [7] BSgenome_1.28.0 Category_2.26.0 cluster_1.14.4 >> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 >> [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 >> [16] ggplot2_0.9.3.1 GO.db_2.9.0 GOstats_2.26.0 >> [19] graph_1.38.2 grid_3.0.1 gridExtra_0.9.1 >> [22] GSEABase_1.22.0 gtable_0.1.2 Hmisc_3.10-1.1 >> [25] hwriter_1.3 labeling_0.1 lattice_0.20-15 >> [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0 >> [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 >> [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 >> [37] R.methodsS3_1.4.2 R.oo_1.13.0 Rsamtools_1.12.3 >> [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 >> [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2 >> [46] survival_2.37-4 tools_3.0.1 >> VariantAnnotation_1.6.6 >> [49] XML_3.96-1.1 xtable_1.7-1 zlibbioc_1.6.0 >> >> >> >> >> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at="" uw.edu<mailto:="">> jmacdon at uw.edu>> wrote: >> >> Hi Sam, >> >> First, please always give us the results of sessionInfo(). This is >> especially critical in the case of ReportingTools, which has been >> fundamentally altered between the previous and current versions of >> BioC. >> >> >> On 6/19/2013 11:12 AM, Sam McInturf wrote: >> >> Bioconductors, >> I am working on a RNA seq analysis project and am having >> trouble >> publishing an HTML report for it. I am unsure of how to make >> my DE genes >> have the same ID as what publish() will accept when passing an >> argument to >> 'annotation'. >> I mapped the reads using tophat and passed the TAIR 10 >> gtf file to the >> -G option. When i counted my reads I used the >> summarizeOverlaps function >> from GenomicRanges and again used this same file. I called >> differential >> expression in edgeR using the GLM methods. So the rownames >> of my DE table >> are the AGI identifiers (AT#G#####). I loaded the org.At.tair.db >> annotations and passed it to HTMLReport in: >> >> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, >> conditions = >> group, annotation = "org.At.tair.db", pvaueCutoff = 0.01, lfc >> =2, n = 1000, >> name = "RootsLRT") >> Error: More than half of your IDs could not be mapped. >> In addition: Warning message: >> In .DGELRT.to.data.frame(object, ...) : NAs introduced by coercion >> >> which makes sense, because publish() is looking for Entrez IDs >> (right?) >> >> How do I proceed? >> >> >> Here I assume you are running R-3.0.x and the current release of BioC. >> >> When you run publish() on anything but a data.frame, the first >> step is to coerce to a data.frame using a set of assumptions that >> might not hold in your case (or there may be defaults that you >> don't like). Because of this, I tend to just coerce to a >> data.frame myself and then publish() that directly. This also >> allows you to pass in arguments to .modifyDF which is kind of sweet. >> >> In the case of a DGELRT or DEGExact object, there is a 'genes' >> slot that will be used to annotate the output of topTags(). >> Ideally you would just add the annotation you want to that slot >> first. So you could do something like >> >> annot <- select(org.At.tair.db, DGELists[["Roots"]]$genes[,<**Tair >> column goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF")) >> >> >> and then put that in your DGEobjects. Now you can do something like >> >> outlst <- lapply(DGELists, topTags, otherargsgohere) >> >> htmlst <- lapply(seq_len(length(**DGELists)) function(x) >> >> HTMLReport(namevector[x], titlevector[x], otherargs)) >> >> and you can define a function similar to this function I use for >> Entrez Gene IDs: >> >> entrezLinks <- function (df, ...){ >> df$ENTREZID <- hwriter::hwrite(as.character(**df$ENTREZID), >> link = paste0("http://www.ncbi.nlm.**nih.gov/gene/<http :="" www.ncbi.nlm.nih.gov="" gene=""/> >> ", >> >> as.character(df$ENTREZID)), >> table = FALSE) >> return(df) >> } >> >> but modified for the Tair equivalent and then >> >> lapply(seq_len(length(htmlst))**, function(x) publish(outlst[[x]], >> >> htmlst[[x]], .modifyDF = samsTairLinkFun))) >> lapply(htmlst, finish) >> >> et voila! >> >> You can also then use htmlst to make a bunch of links in an >> index.html page. >> >> indx <- HTMLReport("index", "A bunch of links for this expt", >> reportDirectory=".", baseUrl = "") >> publish(hwriter::hwrite("Here are links", page(indx), header=2, >> br=TRUE), indx) >> publish(Link(htmlst, report=indx), indx) >> finish(indx) >> >> Best, >> >> Jim >> >> >> >> Thanks in advance! >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> >> -- >> Sam McInturf >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- Gabriel Becker Graduate Student Statistics Department University of California, Davis

ADD REPLY • link 11.8 years ago Gabriel Becker ▴ 70

0

Entering edit mode

Hi Gabriel, On 6/19/2013 3:22 PM, Gabriel Becker wrote: > James, > > Thanks for responding to Sam's question. > > One small point, you can control the coercion of the object to a > data.frame within the publish call via the .toDF argument, which is > applied directly before .modifyDF. Nice! I think I missed that one. I'll take a closer look. > > The defaults of these steps (coercion and modification of data.frames) > can also be extended on an S4 class basis, by creating new methods for > toReportDF and modifyReportDF if you find yourself always using the > same customization across multiple scripts (this approaches creating a > package which depends on ReportingTools, however, and is likely > overkill in most situations). Well, I have a package that I am converting from using annaffy to using ReportingTools, so it already depends on ReportingTools. I certainly have need to modify things. As an example, it is not uncommon for me to do a GO hypergeometric test for multiple different contrasts, and want to output the results. However the default in ReportingTools for these objects is to create a table with a bunch of glyphs, and then automatically make tables for all the significant GO terms that contain the underlying genes. This is OK as long as you don't try to A.) Do a bunch of contrasts and then go GO analyses on all of them (thereby creating literally thousands of little files) and then B.) zip up and send to a collaborator who is on Windows. Turns out the Windows zip utility will silently fail when unzipping a file with thousands of files within it. No error, nothing. Just truncated output. And a PI who wants to know where the rest of it is... So having my own personal way of outputting GO results will be quite useful. Thanks for the heads-up. Best, Jim > > There is a flowchart for the full publish mechanism in > inst/examples/publishFlowchart.svg, which I am attaching here as well > for convenience. > > ~G > > > On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Sam, > > Sorry, my bad. One of the arguments for DGEList is 'genes', which is > > genes: data frame containing annotation information for the > tags/transcripts/genes. > > So yeah, when you are creating your DGELists, add in the > annotation data for each gene/transcript/whatever, and then go > from there. > > Best, > > Jim > > > > On 6/19/2013 1:37 PM, Sam McInturf wrote: > > Jim, > When I look at my DGELRT objects I don't have a $genes > field: (to be verbose, DGELists is a list of DGELRT objects > (list(glmLRT(), glmLRT(),...)) > > > DGELists$Roots$genes > NULL > > Looking at what fields I do have: coefficients, fitted.values, > deviance, df.residual, design, offset, dispersion, method, > samples, AveLogCPM, table, comparison, df.test > > Looking at the DGEGLM which my DGELists were made, I have no > gene slot either. I added the rownames of the counts table in > the DGEGLM to a new $genes: > > >fit$genes <- rownames(fit$counts) > > but the the $genes slot did not get passed onto the DGELRTs. > Is it best just to add the $genes field to each DGELRT and > then follow what you stated before? > > Thanks, > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods > [8] base > > other attached packages: > [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 > [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 > [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 > [10] edgeR_3.2.3 limma_3.16.5 > BiocInstaller_1.10.2 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationForge_1.2.1 > biomaRt_2.16.0 > [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 > [7] BSgenome_1.28.0 Category_2.26.0 > cluster_1.14.4 > [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 > [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 > [16] ggplot2_0.9.3.1 GO.db_2.9.0 > GOstats_2.26.0 > [19] graph_1.38.2 grid_3.0.1 > gridExtra_0.9.1 > [22] GSEABase_1.22.0 gtable_0.1.2 > Hmisc_3.10-1.1 > [25] hwriter_1.3 labeling_0.1 > lattice_0.20-15 > [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0 > [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 > [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 > reshape2_1.2.2 > [37] R.methodsS3_1.4.2 R.oo_1.13.0 > Rsamtools_1.12.3 > [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 > [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2 > [46] survival_2.37-4 tools_3.0.1 > VariantAnnotation_1.6.6 > [49] XML_3.96-1.1 xtable_1.7-1 > zlibbioc_1.6.0 > > > > > On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald > <jmacdon at="" uw.edu="" <mailto:jmacdon="" at="" uw.edu=""> <mailto:jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">>> wrote: > > Hi Sam, > > First, please always give us the results of sessionInfo(). > This is > especially critical in the case of ReportingTools, which > has been > fundamentally altered between the previous and current > versions of > BioC. > > > On 6/19/2013 11:12 AM, Sam McInturf wrote: > > Bioconductors, > I am working on a RNA seq analysis project and am > having > trouble > publishing an HTML report for it. I am unsure of how > to make > my DE genes > have the same ID as what publish() will accept when > passing an > argument to > 'annotation'. > I mapped the reads using tophat and passed the > TAIR 10 > gtf file to the > -G option. When i counted my reads I used the > summarizeOverlaps function > from GenomicRanges and again used this same file. I > called > differential > expression in edgeR using the GLM methods. So the > rownames > of my DE table > are the AGI identifiers (AT#G#####). I loaded the > org.At.tair.db > annotations and passed it to HTMLReport in: > > publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, > conditions = > group, annotation = "org.At.tair.db", pvaueCutoff = > 0.01, lfc > =2, n = 1000, > name = "RootsLRT") > Error: More than half of your IDs could not be mapped. > In addition: Warning message: > In .DGELRT.to.data.frame(object, ...) : NAs introduced > by coercion > > which makes sense, because publish() is looking for > Entrez IDs > (right?) > > How do I proceed? > > > Here I assume you are running R-3.0.x and the current > release of BioC. > > When you run publish() on anything but a data.frame, the first > step is to coerce to a data.frame using a set of > assumptions that > might not hold in your case (or there may be defaults that you > don't like). Because of this, I tend to just coerce to a > data.frame myself and then publish() that directly. This also > allows you to pass in arguments to .modifyDF which is kind > of sweet. > > In the case of a DGELRT or DEGExact object, there is a 'genes' > slot that will be used to annotate the output of topTags(). > Ideally you would just add the annotation you want to that > slot > first. So you could do something like > > annot <- select(org.At.tair.db, > DGELists[["Roots"]]$genes[,<tair> column goes here>], c("SYMBOL","GENENAME","OTHERSTUFF")) > > > and then put that in your DGEobjects. Now you can do > something like > > outlst <- lapply(DGELists, topTags, otherargsgohere) > > htmlst <- lapply(seq_len(length(DGELists)) function(x) > > HTMLReport(namevector[x], titlevector[x], otherargs)) > > and you can define a function similar to this function I > use for > Entrez Gene IDs: > > entrezLinks <- function (df, ...){ > df$ENTREZID <- hwriter::hwrite(as.character(df$ENTREZID), > link = paste0("http://www.ncbi.nlm.nih.gov/gene/", > > as.character(df$ENTREZID)), > table = FALSE) > return(df) > } > > but modified for the Tair equivalent and then > > lapply(seq_len(length(htmlst)), function(x) > publish(outlst[[x]], > > htmlst[[x]], .modifyDF = samsTairLinkFun))) > lapply(htmlst, finish) > > et voila! > > You can also then use htmlst to make a bunch of links in an > index.html page. > > indx <- HTMLReport("index", "A bunch of links for this expt", > reportDirectory=".", baseUrl = "") > publish(hwriter::hwrite("Here are links", page(indx), > header=2, > br=TRUE), indx) > publish(Link(htmlst, report=indx), indx) > finish(indx) > > Best, > > Jim > > > > Thanks in advance! > > > -- James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > > > > -- > Sam McInturf > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > Gabriel Becker > Graduate Student > Statistics Department > University of California, Davis -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.8 years ago James W. MacDonald 68k

0

Entering edit mode

James, On Wed, Jun 19, 2013 at 1:50 PM, James W. MacDonald <jmacdon@uw.edu> wrote: > Hi Gabriel, > > > On 6/19/2013 3:22 PM, Gabriel Becker wrote: > >> James, >> >> Thanks for responding to Sam's question. >> >> One small point, you can control the coercion of the object to a >> data.frame within the publish call via the .toDF argument, which is applied >> directly before .modifyDF. >> > > Nice! I think I missed that one. I'll take a closer look. > > > >> The defaults of these steps (coercion and modification of data.frames) >> can also be extended on an S4 class basis, by creating new methods for >> toReportDF and modifyReportDF if you find yourself always using the same >> customization across multiple scripts (this approaches creating a package >> which depends on ReportingTools, however, and is likely overkill in most >> situations). >> > > Well, I have a package that I am converting from using annaffy to using > ReportingTools, so it already depends on ReportingTools. I certainly have > need to modify things. > > As an example, it is not uncommon for me to do a GO hypergeometric test > for multiple different contrasts, and want to output the results. However > the default in ReportingTools for these objects is to create a table with a > bunch of glyphs, and then automatically make tables for all the significant > GO terms that contain the underlying genes. > This is OK as long as you don't try to A.) Do a bunch of contrasts and > then go GO analyses on all of them (thereby creating literally thousands of > little files) and then B.) zip up and send to a collaborator who is on > Windows. > > Turns out the Windows zip utility will silently fail when unzipping a file > with thousands of files within it. No error, nothing. Just truncated > output. And a PI who wants to know where the rest of it is... > > It is actually possible to include images directly in the HTML without creating external files (or rather, with creating them, but then not requiring they continue to exist) using data URIs. See http://css-tricks.com/data-uris/ Since we give you full control over what comes out, you can add logic similar to the following: df$Image = sapply(rows, function(r) { fname = glyphFileName(r) png(file = fname, ...) makeGlyph(r) dev.off() img(fname) #from base64 package, seems to work but I haven't tested it extensively } in your .modifyDF function, with appropriate definitions for glyphFileName and makeGlyph. We don't do this by default because it can make the html files extremely large without much benefit when hosting the files on a server, but when you are transporting them manually to individual collaborators instead of serving them over the web, this may not be an issue. A couple of notes about this approach: You'll want to remove the files if they aren't needed The base64 package is listed as having no maintainer. If that proves to be the case and people want to use it I'm sure a solution can be found. ~G > So having my own personal way of outputting GO results will be quite > useful. Thanks for the heads-up. > > Best, > > Jim > > > >> There is a flowchart for the full publish mechanism in inst/examples/**publishFlowchart.svg, >> which I am attaching here as well for convenience. >> >> ~G >> >> >> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon@uw.edu<mailto:>> jmacdon@uw.edu>> wrote: >> >> Hi Sam, >> >> Sorry, my bad. One of the arguments for DGEList is 'genes', which is >> >> genes: data frame containing annotation information for the >> tags/transcripts/genes. >> >> So yeah, when you are creating your DGELists, add in the >> annotation data for each gene/transcript/whatever, and then go >> from there. >> >> Best, >> >> Jim >> >> >> >> On 6/19/2013 1:37 PM, Sam McInturf wrote: >> >> Jim, >> When I look at my DGELRT objects I don't have a $genes >> field: (to be verbose, DGELists is a list of DGELRT objects >> (list(glmLRT(), glmLRT(),...)) >> >> > DGELists$Roots$genes >> NULL >> >> Looking at what fields I do have: coefficients, fitted.values, >> deviance, df.residual, design, offset, dispersion, method, >> samples, AveLogCPM, table, comparison, df.test >> >> Looking at the DGEGLM which my DGELists were made, I have no >> gene slot either. I added the rownames of the counts table in >> the DGEGLM to a new $genes: >> >> >fit$genes <- rownames(fit$counts) >> >> but the the $genes slot did not get passed onto the DGELRTs. >> Is it best just to add the $genes field to each DGELRT and >> then follow what you stated before? >> >> Thanks, >> >> > sessionInfo() >> R version 3.0.1 (2013-05-16) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets >> methods >> [8] base >> >> other attached packages: >> [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 >> [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 >> [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 >> [10] edgeR_3.2.3 limma_3.16.5 >> BiocInstaller_1.10.2 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.38.0 AnnotationForge_1.2.1 >> biomaRt_2.16.0 >> [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 >> [7] BSgenome_1.28.0 Category_2.26.0 >> cluster_1.14.4 >> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 >> [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 >> [16] ggplot2_0.9.3.1 GO.db_2.9.0 >> GOstats_2.26.0 >> [19] graph_1.38.2 grid_3.0.1 >> gridExtra_0.9.1 >> [22] GSEABase_1.22.0 gtable_0.1.2 >> Hmisc_3.10-1.1 >> [25] hwriter_1.3 labeling_0.1 >> lattice_0.20-15 >> [28] MASS_7.3-26 munsell_0.4 PFAM.db_2.9.0 >> [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 >> [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 >> reshape2_1.2.2 >> [37] R.methodsS3_1.4.2 R.oo_1.13.0 >> Rsamtools_1.12.3 >> [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 >> [43] splines_3.0.1 stats4_3.0.1 stringr_0.6.2 >> [46] survival_2.37-4 tools_3.0.1 >> VariantAnnotation_1.6.6 >> [49] XML_3.96-1.1 xtable_1.7-1 >> zlibbioc_1.6.0 >> >> >> >> >> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald >> <jmacdon@uw.edu <mailto:jmacdon@uw.edu=""> <mailto:jmacdon@uw.edu>> >> <mailto:jmacdon@uw.edu>>> wrote: >> >> Hi Sam, >> >> First, please always give us the results of sessionInfo(). >> This is >> especially critical in the case of ReportingTools, which >> has been >> fundamentally altered between the previous and current >> versions of >> BioC. >> >> >> On 6/19/2013 11:12 AM, Sam McInturf wrote: >> >> Bioconductors, >> I am working on a RNA seq analysis project and am >> having >> trouble >> publishing an HTML report for it. I am unsure of how >> to make >> my DE genes >> have the same ID as what publish() will accept when >> passing an >> argument to >> 'annotation'. >> I mapped the reads using tophat and passed the >> TAIR 10 >> gtf file to the >> -G option. When i counted my reads I used the >> summarizeOverlaps function >> from GenomicRanges and again used this same file. I >> called >> differential >> expression in edgeR using the GLM methods. So the >> rownames >> of my DE table >> are the AGI identifiers (AT#G#####). I loaded the >> org.At.tair.db >> annotations and passed it to HTMLReport in: >> >> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, >> conditions = >> group, annotation = "org.At.tair.db", pvaueCutoff = >> 0.01, lfc >> =2, n = 1000, >> name = "RootsLRT") >> Error: More than half of your IDs could not be mapped. >> In addition: Warning message: >> In .DGELRT.to.data.frame(object, ...) : NAs introduced >> by coercion >> >> which makes sense, because publish() is looking for >> Entrez IDs >> (right?) >> >> How do I proceed? >> >> >> Here I assume you are running R-3.0.x and the current >> release of BioC. >> >> When you run publish() on anything but a data.frame, the first >> step is to coerce to a data.frame using a set of >> assumptions that >> might not hold in your case (or there may be defaults that you >> don't like). Because of this, I tend to just coerce to a >> data.frame myself and then publish() that directly. This also >> allows you to pass in arguments to .modifyDF which is kind >> of sweet. >> >> In the case of a DGELRT or DEGExact object, there is a 'genes' >> slot that will be used to annotate the output of topTags(). >> Ideally you would just add the annotation you want to that >> slot >> first. So you could do something like >> >> annot <- select(org.At.tair.db, >> DGELists[["Roots"]]$genes[,<**Tair >> column goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF")) >> >> >> and then put that in your DGEobjects. Now you can do >> something like >> >> outlst <- lapply(DGELists, topTags, otherargsgohere) >> >> htmlst <- lapply(seq_len(length(**DGELists)) function(x) >> >> HTMLReport(namevector[x], titlevector[x], otherargs)) >> >> and you can define a function similar to this function I >> use for >> Entrez Gene IDs: >> >> entrezLinks <- function (df, ...){ >> df$ENTREZID <- hwriter::hwrite(as.character(** >> df$ENTREZID), >> link = paste0("http://www.ncbi.nlm.**nih.gov/ge ne/<http: www.ncbi.nlm.nih.gov="" gene=""/> >> ", >> >> as.character(df$ENTREZID)), >> table = FALSE) >> return(df) >> } >> >> but modified for the Tair equivalent and then >> >> lapply(seq_len(length(htmlst))**, function(x) >> publish(outlst[[x]], >> >> htmlst[[x]], .modifyDF = samsTairLinkFun))) >> lapply(htmlst, finish) >> >> et voila! >> >> You can also then use htmlst to make a bunch of links in an >> index.html page. >> >> indx <- HTMLReport("index", "A bunch of links for this expt", >> reportDirectory=".", baseUrl = "") >> publish(hwriter::hwrite("Here are links", page(indx), >> header=2, >> br=TRUE), indx) >> publish(Link(htmlst, report=indx), indx) >> finish(indx) >> >> Best, >> >> Jim >> >> >> >> Thanks in advance! >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> >> >> >> -- Sam McInturf >> >> >> -- James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**con ductor<http: news.gmane.org="" gmane.science.biology.informatics.conduct="" or=""> >> >> >> >> >> -- >> Gabriel Becker >> Graduate Student >> Statistics Department >> University of California, Davis >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Gabriel Becker ▴ 70

0

Entering edit mode

Still having some troubles : / My intention was to load annotation data into a $genes slot for publish() to pick up by not including the annotations argument in the publish call. tempDE <- DGELists$Roots tempDE$genes <- select(org.At.tair.db, keys= rownames(tempDE$genes), col = c("SYMBOL", "PMID", "GENENAME"), keytype="TAIR") myHTML <- HTMLReport(shortName = "RootsDE.html", title = "File made from edgeR output",basePath = basePath, reportDirectory = "./reports") publish(tempDE, myHTML, countTable = cpmMat, conditions = group, pvaueCutoff = 0.01, lfc =2, n = 100, name = "RootsLRT") finish(myHTML) Error: More than half of your IDs could not be mapped. and no output in my directory. so, after adding the annotation information to the $genes slot, I ran it through topTags() and then converted it to a dataframe (as.data.frame(topTags())). I was able to use publish() on this! tempDE <- DGELists$Roots tempDE$genes <- select(org.At.tair.db, keys= rownames(tempDE$genes), col = c("SYMBOL", "PMID", "GENENAME"), keytype="TAIR") otDE <- as.data.frame(topTags(tempDE)) myHTML <- HTMLReport(shortName = "RootsDE.html", title = "File made from edgeR output",basePath = basePath, reportDirectory = "./reports") publish(otDE, myHTML, countTable = cpmMat, conditions = group, name = "RootsLRT") finish(myHTML) But I am left wonder, if I pass a DGELRT object to publish(), I understand that it should call topTags() on it and then do its thing from there. But this is essentially what I did as well, so why is it working when I do it 'manually' or what am I missing? Additionally, I am getting this to work because I am taking annotations and loading them into the $genes slot, which is only looked at by publish() when nothing is passed to the annotation argument. What do I have to do to my DGELRT object for the annotations argument to work / what am I missing? Finally, some of my annotations have a one to many mapping which results in one gene being displayed on several lines with the only change being in one field (say, PMIDs). Is there a 'fast' way to put all of these many to one mappings into a a single deliminated field (18614708 / 18614708 / 18614708) or is it something that I should just make a quick function for? Thanks On Wed, Jun 19, 2013 at 4:25 PM, Gabriel Becker <gmbecker@ucdavis.edu>wrote: > James, > > > > > On Wed, Jun 19, 2013 at 1:50 PM, James W. MacDonald <jmacdon@uw.edu>wrote: > >> Hi Gabriel, >> >> >> On 6/19/2013 3:22 PM, Gabriel Becker wrote: >> >>> James, >>> >>> Thanks for responding to Sam's question. >>> >>> One small point, you can control the coercion of the object to a >>> data.frame within the publish call via the .toDF argument, which is applied >>> directly before .modifyDF. >>> >> >> Nice! I think I missed that one. I'll take a closer look. >> >> >> >>> The defaults of these steps (coercion and modification of data.frames) >>> can also be extended on an S4 class basis, by creating new methods for >>> toReportDF and modifyReportDF if you find yourself always using the same >>> customization across multiple scripts (this approaches creating a package >>> which depends on ReportingTools, however, and is likely overkill in most >>> situations). >>> >> >> Well, I have a package that I am converting from using annaffy to using >> ReportingTools, so it already depends on ReportingTools. I certainly have >> need to modify things. >> >> As an example, it is not uncommon for me to do a GO hypergeometric test >> for multiple different contrasts, and want to output the results. However >> the default in ReportingTools for these objects is to create a table with a >> bunch of glyphs, and then automatically make tables for all the significant >> GO terms that contain the underlying genes. > > >> This is OK as long as you don't try to A.) Do a bunch of contrasts and >> then go GO analyses on all of them (thereby creating literally thousands of >> little files) and then B.) zip up and send to a collaborator who is on >> Windows. >> >> Turns out the Windows zip utility will silently fail when unzipping a >> file with thousands of files within it. No error, nothing. Just truncated >> output. And a PI who wants to know where the rest of it is... >> >> > It is actually possible to include images directly in the HTML without > creating external files (or rather, with creating them, but then not > requiring they continue to exist) using data URIs. See > http://css-tricks.com/data-uris/ > > Since we give you full control over what comes out, you can add logic > similar to the following: > > df$Image = sapply(rows, function(r) > { > fname = glyphFileName(r) > png(file = fname, ...) > makeGlyph(r) > dev.off() > img(fname) #from base64 package, seems to work but I haven't tested it > extensively > } > > in your .modifyDF function, with appropriate definitions for glyphFileName > and makeGlyph. > > We don't do this by default because it can make the html files extremely > large without much benefit when hosting the files on a server, but when > you are transporting them manually to individual collaborators instead of > serving them over the web, this may not be an issue. > > A couple of notes about this approach: > > You'll want to remove the files if they aren't needed > > The base64 package is listed as having no maintainer. If that proves to be > the case and people want to use it I'm sure a solution can be found. > > ~G > > >> So having my own personal way of outputting GO results will be quite >> useful. Thanks for the heads-up. >> >> Best, >> >> Jim >> >> >> >>> There is a flowchart for the full publish mechanism in inst/examples/**publishFlowchart.svg, >>> which I am attaching here as well for convenience. >>> >>> ~G >>> >>> >>> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald <jmacdon@uw.edu<mailto:>>> jmacdon@uw.edu>> wrote: >>> >>> Hi Sam, >>> >>> Sorry, my bad. One of the arguments for DGEList is 'genes', which is >>> >>> genes: data frame containing annotation information for the >>> tags/transcripts/genes. >>> >>> So yeah, when you are creating your DGELists, add in the >>> annotation data for each gene/transcript/whatever, and then go >>> from there. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> On 6/19/2013 1:37 PM, Sam McInturf wrote: >>> >>> Jim, >>> When I look at my DGELRT objects I don't have a $genes >>> field: (to be verbose, DGELists is a list of DGELRT objects >>> (list(glmLRT(), glmLRT(),...)) >>> >>> > DGELists$Roots$genes >>> NULL >>> >>> Looking at what fields I do have: coefficients, fitted.values, >>> deviance, df.residual, design, offset, dispersion, method, >>> samples, AveLogCPM, table, comparison, df.test >>> >>> Looking at the DGEGLM which my DGELists were made, I have no >>> gene slot either. I added the rownames of the counts table in >>> the DGEGLM to a new $genes: >>> >>> >fit$genes <- rownames(fit$counts) >>> >>> but the the $genes slot did not get passed onto the DGELRTs. >>> Is it best just to add the $genes field to each DGELRT and >>> then follow what you stated before? >>> >>> Thanks, >>> >>> > sessionInfo() >>> R version 3.0.1 (2013-05-16) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets >>> methods >>> [8] base >>> >>> other attached packages: >>> [1] ReportingTools_2.0.1 org.At.tair.db_2.9.0 RSQLite_0.11.4 >>> [4] DBI_0.2-7 AnnotationDbi_1.22.6 Biobase_2.20.0 >>> [7] GenomicRanges_1.12.4 IRanges_1.18.1 BiocGenerics_0.6.0 >>> [10] edgeR_3.2.3 limma_3.16.5 >>> BiocInstaller_1.10.2 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.38.0 AnnotationForge_1.2.1 >>> biomaRt_2.16.0 >>> [4] Biostrings_2.28.0 biovizBase_1.8.1 bitops_1.0-5 >>> [7] BSgenome_1.28.0 Category_2.26.0 >>> cluster_1.14.4 >>> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 >>> [13] genefilter_1.42.0 GenomicFeatures_1.12.2 ggbio_1.8.3 >>> [16] ggplot2_0.9.3.1 GO.db_2.9.0 >>> GOstats_2.26.0 >>> [19] graph_1.38.2 grid_3.0.1 >>> gridExtra_0.9.1 >>> [22] GSEABase_1.22.0 gtable_0.1.2 >>> Hmisc_3.10-1.1 >>> [25] hwriter_1.3 labeling_0.1 >>> lattice_0.20-15 >>> [28] MASS_7.3-26 munsell_0.4 >>> PFAM.db_2.9.0 >>> [31] plyr_1.8 proto_0.3-10 RBGL_1.36.2 >>> [34] RColorBrewer_1.0-5 RCurl_1.95-4.1 >>> reshape2_1.2.2 >>> [37] R.methodsS3_1.4.2 R.oo_1.13.0 >>> Rsamtools_1.12.3 >>> [40] rtracklayer_1.20.2 R.utils_1.23.2 scales_0.2.3 >>> [43] splines_3.0.1 stats4_3.0.1 >>> stringr_0.6.2 >>> [46] survival_2.37-4 tools_3.0.1 >>> VariantAnnotation_1.6.6 >>> [49] XML_3.96-1.1 xtable_1.7-1 >>> zlibbioc_1.6.0 >>> >>> >>> >>> >>> On Wed, Jun 19, 2013 at 10:55 AM, James W. MacDonald >>> <jmacdon@uw.edu <mailto:jmacdon@uw.edu=""> <mailto:jmacdon@uw.edu>>> >>> <mailto:jmacdon@uw.edu>>> wrote: >>> >>> Hi Sam, >>> >>> First, please always give us the results of sessionInfo(). >>> This is >>> especially critical in the case of ReportingTools, which >>> has been >>> fundamentally altered between the previous and current >>> versions of >>> BioC. >>> >>> >>> On 6/19/2013 11:12 AM, Sam McInturf wrote: >>> >>> Bioconductors, >>> I am working on a RNA seq analysis project and am >>> having >>> trouble >>> publishing an HTML report for it. I am unsure of how >>> to make >>> my DE genes >>> have the same ID as what publish() will accept when >>> passing an >>> argument to >>> 'annotation'. >>> I mapped the reads using tophat and passed the >>> TAIR 10 >>> gtf file to the >>> -G option. When i counted my reads I used the >>> summarizeOverlaps function >>> from GenomicRanges and again used this same file. I >>> called >>> differential >>> expression in edgeR using the GLM methods. So the >>> rownames >>> of my DE table >>> are the AGI identifiers (AT#G#####). I loaded the >>> org.At.tair.db >>> annotations and passed it to HTMLReport in: >>> >>> publish(DGELists[["Roots"]], myHTML, countTable = cpmMat, >>> conditions = >>> group, annotation = "org.At.tair.db", pvaueCutoff = >>> 0.01, lfc >>> =2, n = 1000, >>> name = "RootsLRT") >>> Error: More than half of your IDs could not be mapped. >>> In addition: Warning message: >>> In .DGELRT.to.data.frame(object, ...) : NAs introduced >>> by coercion >>> >>> which makes sense, because publish() is looking for >>> Entrez IDs >>> (right?) >>> >>> How do I proceed? >>> >>> >>> Here I assume you are running R-3.0.x and the current >>> release of BioC. >>> >>> When you run publish() on anything but a data.frame, the >>> first >>> step is to coerce to a data.frame using a set of >>> assumptions that >>> might not hold in your case (or there may be defaults that >>> you >>> don't like). Because of this, I tend to just coerce to a >>> data.frame myself and then publish() that directly. This also >>> allows you to pass in arguments to .modifyDF which is kind >>> of sweet. >>> >>> In the case of a DGELRT or DEGExact object, there is a >>> 'genes' >>> slot that will be used to annotate the output of topTags(). >>> Ideally you would just add the annotation you want to that >>> slot >>> first. So you could do something like >>> >>> annot <- select(org.At.tair.db, >>> DGELists[["Roots"]]$genes[,<**Tair >>> column goes here>], c("SYMBOL","GENENAME","**OTHERSTUFF")) >>> >>> >>> and then put that in your DGEobjects. Now you can do >>> something like >>> >>> outlst <- lapply(DGELists, topTags, otherargsgohere) >>> >>> htmlst <- lapply(seq_len(length(**DGELists)) function(x) >>> >>> HTMLReport(namevector[x], titlevector[x], otherargs)) >>> >>> and you can define a function similar to this function I >>> use for >>> Entrez Gene IDs: >>> >>> entrezLinks <- function (df, ...){ >>> df$ENTREZID <- hwriter::hwrite(as.character(** >>> df$ENTREZID), >>> link = paste0("http://www.ncbi.nlm.**nih.gov/g ene/<http: www.ncbi.nlm.nih.gov="" gene=""/> >>> ", >>> >>> as.character(df$ENTREZID)), >>> table = FALSE) >>> return(df) >>> } >>> >>> but modified for the Tair equivalent and then >>> >>> lapply(seq_len(length(htmlst))**, function(x) >>> publish(outlst[[x]], >>> >>> htmlst[[x]], .modifyDF = samsTairLinkFun))) >>> lapply(htmlst, finish) >>> >>> et voila! >>> >>> You can also then use htmlst to make a bunch of links in an >>> index.html page. >>> >>> indx <- HTMLReport("index", "A bunch of links for this expt", >>> reportDirectory=".", baseUrl = "") >>> publish(hwriter::hwrite("Here are links", page(indx), >>> header=2, >>> br=TRUE), indx) >>> publish(Link(htmlst, report=indx), indx) >>> finish(indx) >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> Thanks in advance! >>> >>> >>> -- James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >>> >>> >>> -- Sam McInturf >>> >>> >>> -- James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >>> > >>> >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.** >>> conductor<http: news.gmane.org="" gmane.science.biology.informatics.="" conductor=""> >>> >>> >>> >>> >>> -- >>> Gabriel Becker >>> Graduate Student >>> Statistics Department >>> University of California, Davis >>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> > > > -- > Gabriel Becker > Graduate Student > Statistics Department > University of California, Davis > -- Sam McInturf [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Sam McInturf ▴ 300

0

Entering edit mode

Hi Sam, Sorry for missing so much of the discussion! On Wed, Jun 19, 2013 at 3:17 PM, Sam McInturf <smcinturf@gmail.com> wrote: > Still having some troubles : / > > My intention was to load annotation data into a $genes slot for publish() > to pick up by not including the annotations argument in the publish call. > > tempDE <- DGELists$Roots > tempDE$genes <- select(org.At.tair.db, keys= rownames(tempDE$genes), col = > c("SYMBOL", "PMID", "GENENAME"), keytype="TAIR") > myHTML <- HTMLReport(shortName = "RootsDE.html", title = "File made from > edgeR output",basePath = basePath, reportDirectory = "./reports") > publish(tempDE, myHTML, countTable = cpmMat, conditions = group, > pvaueCutoff = 0.01, lfc =2, n = 100, name = "RootsLRT") > finish(myHTML) > > Error: More than half of your IDs could not be mapped. > > I believe that you can get around this by adding annotation.db=NULL to you publish call. This will force the use of the genes frame from the DGELRT object. > tempDE <- DGELists$Roots > tempDE$genes <- select(org.At.tair.db, keys= rownames(tempDE$genes), col = > c("SYMBOL", "PMID", "GENENAME"), keytype="TAIR") > otDE <- as.data.frame(topTags(tempDE)) > myHTML <- HTMLReport(shortName = "RootsDE.html", title = "File made from > edgeR output",basePath = basePath, reportDirectory = "./reports") > publish(otDE, myHTML, countTable = cpmMat, conditions = group, name = > "RootsLRT") > finish(myHTML) > > But I am left wonder, if I pass a DGELRT object to publish(), I understand > that it should call topTags() on it and then do its thing from there. But > this is essentially what I did as well, so why is it working when I do it > 'manually' or what am I missing? > We generally try to produce default annotations based on the Bioc org.Xx.db packages. I'm really not sure why this would fail for the org.At.tair.db package, especially since the select you're doing there is doing the same thing as us. > Additionally, I am getting this to work because I am taking annotations and > loading them into the $genes slot, which is only looked at by publish() > when nothing is passed to the annotation argument. What do I have to do to > my DGELRT object for the annotations argument to work / what am I missing? > Aha! I just found the answer to this, and it results from an assumption that holds true for mouse and human but probably not other species. We expect that most genes will have a symbol in the org.Xx.db package. This turns out not to be the case for Arabidopsis. I'll downgrade this from an error to a warning. Thanks for finding this! Unfortunately, most of our work is on mouse and human, so we just don't see the breadth of what's out there for annotation packages. I'll let you know when we update the package, and can send you a new version with the change, if you'd like. > Finally, some of my annotations have a one to many mapping which results in > one gene being displayed on several lines with the only change being in one > field (say, PMIDs). Is there a 'fast' way to put all of these many to one > mappings into a a single deliminated field (18614708 / 18614708 / 18614708) > or is it something that I should just make a quick function for? > > Being relatively unfamiliar with the select method for annotation db's, I'll leave this for someone else to answer. My general way of doing something like this is to use mget and something like this: pmidL <- mget(ids, org.At.tairPMID) pmids <- sapply(pmidL, function(x) { if(all(!is.na(x))) {paste(x, collapse=" / ")} else {""} } > > Thanks > > On Wed, Jun 19, 2013 at 4:25 PM, Gabriel Becker <gmbecker@ucdavis.edu> >wrote: > > Cheers, Jason -- Jason A. Hackney, Ph.D. Bioinformatics and Computational Biology Genentech hackney.jason@gene.com 650-467-5084 [[alternative HTML version deleted]]

ADD REPLY • link 11.8 years ago Jason Hackney ▴ 160

0

Entering edit mode

Hi Sam, On 6/19/2013 6:17 PM, Sam McInturf wrote: > Finally, some of my annotations have a one to many mapping which > results in one gene being displayed on several lines with the only > change being in one field (say, PMIDs). Is there a 'fast' way to put > all of these many to one mappings into a a single deliminated field > (18614708 / 18614708 / 18614708) or is it something that I should just > make a quick function for? I usually do this sort of thing with a combination of the Xapply family of functions. annot <- select(org.At.tair.db, keys= rownames(tempDE$genes), col = c("SYMBOL", "PMID", "GENENAME"), keytype="TAIR") annotlst <- tapply(seq_len(nrow(annot)), annot[,1], function(x) annot[x,]) annotlst <- do.call("rbind", lapply(annotlst, function(x) t(apply(x, 2, function(y) paste(unique(y), collapse = "/"))))) Best, Jim -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD REPLY • link 11.8 years ago James W. MacDonald 68k

Login before adding your answer.