Question

FlowCore/FlowViz issues

0

Entering edit mode

Roger Leigh ▴ 60

@roger-leigh-4255

Last seen 10.6 years ago

Hi, [Apologies if this is delivered twice; my initial mail didn't appear to be accepted for several hours, and I didn't see any failure message; does the list object to GPG signatures?] I've just started to use FlowCore/FlowViz to analyse some of my flow cytometry data, and ran into a few problems. I'm hoping that you might be able to point me in the right direction! I've been very pleased with it so far, and have got some nice plots and stats out of it, but I'm sure I'm doing some things very inefficiently and/or incorrectly! [I'm using R version 2.11.1 (2010-05-31) on x86_64-pc-linux-gnu (Debian GNU/Linux) with current Bioconductor packages)] 1) Gating with an ellipsoidGate cov <- matrix(c(400000000, 0, 0, 0.08), ncol=2, dimnames=list(c("FS.Lin", "SS.Log"), c("FS.Lin", "SS.Log"))) mean <- c("FS.Lin"=32000, "SS.Log"=2.8) cells <- ellipsoidGate(filterId="CellGate", .gate=cov, mean=mean) I want to select my cells using an ellipsoid gate on forward- and side- scatter plots. In the above situation, they lie in a region where FS=32000?10000 and log??(SS)=2.8?0.5. However, the values in the covariance matrix don't match the dimensions; is there any explanation regarding how to construct a covariance matrix from the actual dimension I want (I got the above by trial and error until it fitted nicely--I'm afraid I know little about these matrices). In some plots I'd also like to rotate the ellipse, but I'm not sure how to put this into the matrix, if that's the way to do things. Is this possible? Is there an alternative constructor to create a gate from real dimensions? 2) Plotting on a log scale Currently I'm transforming my "Log" data with a log?? transform: flowset <- transform(flowset, `SS.Log` = log10(`SS.Log`)) flowset <- transform(flowset, `FL.1.Log` = log10(`FL.1.Log`)) When plotting, I have the log values on the axes. However, is it possible to either: a) plot on logarithmic axes with the appropriate log binning, or b) change the axis labels to be a log scale (even though the axes are linear)? I can do this with regular R plots, but I'm not sure if it's possible with lattice graphics. 3) Plotting with lattice/cairo_pdf is broken cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none") xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS))) dev.off() This is a weird one. If I type or paste the above into an interactive R shell, the plot is written as a PDF file. It works just fine. If I source it with 'source("script.R")', it does not. The PDF file is never created/replaced, and from the speed of it and putting print() around each call, it looks like the plotting is entirely skipped. I can't reproduce this with regular plots, so maybe it's the lattice graphics rather than cairo_pdf? 4) Overlaying density plots By default, density plots are stacked down the y-axis with a user- specified overlap. Is is possible to directly overlay one- dimensional histograms on top of each other to allow direct comparison? 5) Updating phenoData I need to add certain information to my flowSet, so I'm currently doing this by getting the phenoData, adding the missing bits and setting it back: sampleNames(flowset) <- c("unstained", "isotype", "ng2") pd <- pData(phenoData(flowset)) pd["isotype"] <- c(FALSE, TRUE, FALSE) pd["description"] <- c("Unstained", "Isotype", "NG2") pData(phenoData(flowset)) <- pd varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype", "Description") This is quite long-winded. Is there a convenience function present (or planned?) to add a single parameter which could update the phenodata and varmetadata at the same time? It would ensure they never get out of sync. 6) fsApply for a single column There's a convenience each_col to apply a function to each column in the flowSet, but is it possible to just access a single column, or set of specified columns? I wrote this little helper: capply <- function (col, func) { function(x) { func(x at exprs[,col]) } } which can then be used like so: fsApply(flowset, capply("FL.1.Log", median)) fsApply(flowset, capply("FL.1.Log", sd)) to get the median and sd for a single channel. However, it doesn't work for mean, and I'm not sure why: fsApply(set2, capply("FL.1.Log", mean)) Error in FUN(if (use.exprs) exprs(y) else y, ...) : could not find function "func" If I call it directly, it works just fine: mean(flowset[[1]]@exprs[,"FL.1.Log"]) [1] 0.4648052 It's not clear do me what the difference is here between the two forms. Is there a built-in or better way to achieve the same thing? 7) Getting at filter summary data > positive <- filter(flowset, pos) > summary(positive) filter summary for frame 'isotype' positive+: 111 of 9186 events (1.21%) filter summary for frame 'pos' positive+: 12070 of 13246 events (91.12%) Is it possible to actually get at the raw data here? i.e. the raw event number and/or the percentages. I'm currently doing it the "hard way", by: newset <- Subset(flowset, pos) fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset, capply("FL.1.Log", length)) * 100 But there must be an easier and more efficient way of extracting this information! Looking at the filter object, I can only extract the strings as printed. Many thanks for all your help, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

• 2.3k views

ADD COMMENT • link updated 14.6 years ago by Nishant Gopalakrishnan ▴ 340 • written 14.6 years ago by Roger Leigh ▴ 60

score 0 · Answer 1 · 2010-09-14

Hi Roger, Answers to your questions are posted below. Nishant 1) Gating with an ellipsoidGate Is there any explanation regarding how to construct a covariance matrix from the actual dimension I want ?. Is there an alternative constructor to create a gate from real dimensions? Ans: As mentioned by Josef, the decision to represent gates using covariance matrix was made because the standards for definition of gates for flow cytometry (gatingML specification) decided to make use of covariance matrices as input. I do agree that an alternative constructor that takes in the half axes, center point etc would be more intuitive to use. I will try to add a method for that. Until then please make use of the spreadsheet provided by Josef. 2) Plotting on a log scale When plotting, I have the log values on the axes. However, is it possible to either: a) plot on logarithmic axes with the appropriate log binning, o b) change the axis labels to be a log scale (even though the axes are linear)? I can do this with regular R plots, but I'm not sure if it's possible with lattice graphics. Ans: Yes it is possible with lattice graphics as well. Please have a look at figure 8.4 and 8.5 and the accompanying code for generating the plots from Deepayans book at http://lmdvr.r-forge.r-project.org/ That should probably get close to what you are looking for. 3) Plotting with lattice/cairo_pdf is broken cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none") xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS))) dev.off() This is a weird one. If I type or paste the above into an interactive R shell, the plot is written as a PDF file. It works just fine. If I source it with 'source("script.R")', it does not. The PDF file is never created/replaced, and from the speed of it and putting print() around each call, it looks like the plotting is entirely skipped. I can't reproduce this with regular plots, so maybe it's the lattice graphics rather than cairo_pdf? Ans: try the above statements, but include print(xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS)))) instead and see if it works. You need to call print on trellis objects to get it to work. 4) Overlaying density plots By default, density plots are stacked down the y-axis with a user- specified overlap. Is is possible to directly overlay one- dimensional histograms on top of each other to allow direct comparison? Ans: I do not think that there is an existing function available in flowViz that let you stack them up. If you want to use lattice, you will have to reshape your data a bit to get it to work. nms <- sampleNames(dat) k <- lapply(nms, function(x){ nr <- nrow(exprs(dat[[x]])) data.frame("FSC-H" = exprs(dat[[x]])[,"FSC-H"], samp =rep(x, nr), check.names =FALSE) }) df <- do.call(rbind, k) densityplot( ~ `FSC-H`, df, groups = samp) Here is a simple way of doing that without lattice data(GvHD) dat <- GvHD[1:3] plot(density(exprs(dat[[1]])[,"FSC-H"], n =512), col= col[1]) col <- c("#FF0000", "#00FF00", "#0000FF") for (i in 1:3) polygon(density(exprs(dat[[i]])[,"FSC-H"], n =512), col = col[i] ) 5) Updating phenoData I need to add certain information to my flowSet, so I'm currently doing this by getting the phenoData, adding the missing bits and setting it back: sampleNames(flowset) <- c("unstained", "isotype", "ng2") pd <- pData(phenoData(flowset)) pd["isotype"] <- c(FALSE, TRUE, FALSE) pd["description"] <- c("Unstained", "Isotype", "NG2") pData(phenoData(flowset)) <- pd varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype", "Description") This is quite long-winded. Is there a convenience function present (or planned?) to add a single parameter which could update the phenodata and varmetadata at the same time? It would ensure they never get out of sync. Ans: I am not quite sure I understand your questions here . There are update method for pData, phenoData, varLabels, varMetadata etc that directly work with flowSets. Please see if these methods or their update methods (<-) are what you are looking for . data(GvHD) pData(GvHD) varLabels(GvHD) varMetadata(GvHD) Update methods varLabels(GvHD) <- c("a", "b", "c", "d", "e") There's a convenience each_col to apply a function to each column in the flowSet, but is it possible to just access a single column, or set of specified columns? I wrote this little helper: capply <- function (col, func) { function(x) { func(x at exprs[,col]) } } which can then be used like so: fsApply(flowset, capply("FL.1.Log", median)) fsApply(flowset, capply("FL.1.Log", sd)) to get the median and sd for a single channel. However, it doesn't work for mean, and I'm not sure why: fsApply(set2, capply("FL.1.Log", mean)) Error in FUN(if (use.exprs) exprs(y) else y, ...) : could not find function "func" If I call it directly, it works just fine: mean(flowset[[1]]@exprs[,"FL.1.Log"]) [1] 0.4648052 It's not clear do me what the difference is here between the two forms. Is there a built-in or better way to achieve the same thing? ANS: If you are looking for median and median statistics for a flowSet wont the summary method give you that ?? data(GvHD) summary(GvHD) If you want to modify your code to get it to work, try this fsApply(GvHD, function(x){ median(exprs(x)[,"FSC-H"]) }) 7) Getting at filter summary data > > positive <- filter(flowset, pos) > > summary(positive) filter summary for frame 'isotype' positive+: 111 of 9186 events (1.21%) filter summary for frame 'pos' positive+: 12070 of 13246 events (91.12%) Is it possible to actually get at the raw data here? i.e. the raw event number and/or the percentages. I'm currently doing it the "hard way", by: newset <- Subset(flowset, pos) fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset, capply("FL.1.Log", length)) * 100 But there must be an easier and more efficient way of extracting this information! Looking at the filter object, I can only extract the strings as printed. Ans: Perhaps this example gives you what you are looking for. dat <- read.FCS(system.file("extdata","0877408774.B08", package="flowCore")) rg <- rectangleGate(filterId="myRectGate", list("FSC-H"=c(200, 600), "SSC-H"=c(0, 400))) res <- filter(dat, rg) ##------------- tmp <- as(res, "logical") inside <- sum(tmp) totalCount <- length(tmp) pr <- inside*100/totalCount On 09/13/2010 07:51 AM, Roger Leigh wrote: > Hi, > > [Apologies if this is delivered twice; my initial mail didn't > appear to be accepted for several hours, and I didn't see any > failure message; does the list object to GPG signatures?] > > I've just started to use FlowCore/FlowViz to analyse some of my > flow cytometry data, and ran into a few problems. I'm hoping > that you might be able to point me in the right direction! > > I've been very pleased with it so far, and have got some nice > plots and stats out of it, but I'm sure I'm doing some things > very inefficiently and/or incorrectly! > > [I'm using R version 2.11.1 (2010-05-31) on x86_64-pc-linux-gnu > (Debian GNU/Linux) with current Bioconductor packages)] > > > 1) Gating with an ellipsoidGate > > cov <- matrix(c(400000000, 0, 0, 0.08), ncol=2, > dimnames=list(c("FS.Lin", "SS.Log"), c("FS.Lin", "SS.Log"))) > mean <- c("FS.Lin"=32000, "SS.Log"=2.8) > cells <- ellipsoidGate(filterId="CellGate", .gate=cov, mean=mean) > > I want to select my cells using an ellipsoid gate on forward- and side- > scatter plots. In the above situation, they lie in a region where > FS=32000?10000 and log??(SS)=2.8?0.5. However, the values in the > covariance matrix don't match the dimensions; is there any explanation > regarding how to construct a covariance matrix from the actual > dimension I want (I got the above by trial and error until it fitted > nicely--I'm afraid I know little about these matrices). > > In some plots I'd also like to rotate the ellipse, but I'm not sure > how to put this into the matrix, if that's the way to do things. > Is this possible? > > Is there an alternative constructor to create a gate from real > dimensions? > > > 2) Plotting on a log scale > > Currently I'm transforming my "Log" data with a log?? transform: > > flowset <- transform(flowset, `SS.Log` = log10(`SS.Log`)) > flowset <- transform(flowset, `FL.1.Log` = log10(`FL.1.Log`)) > > When plotting, I have the log values on the axes. However, is > it possible to either: > > a) plot on logarithmic axes with the appropriate log binning, or > b) change the axis labels to be a log scale (even though the > axes are linear)? > > I can do this with regular R plots, but I'm not sure if it's possible > with lattice graphics. > > > 3) Plotting with lattice/cairo_pdf is broken > > cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none") > xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS))) > dev.off() > > This is a weird one. If I type or paste the above into an interactive R > shell, the plot is written as a PDF file. It works just fine. > If I source it with 'source("script.R")', it does not. The PDF file is > never created/replaced, and from the speed of it and putting print() > around each call, it looks like the plotting is entirely skipped. > > I can't reproduce this with regular plots, so maybe it's the lattice > graphics rather than cairo_pdf? > > > 4) Overlaying density plots > > By default, density plots are stacked down the y-axis with a user- > specified overlap. Is is possible to directly overlay one- > dimensional histograms on top of each other to allow direct > comparison? > > > 5) Updating phenoData > > I need to add certain information to my flowSet, so I'm currently > doing this by getting the phenoData, adding the missing bits and > setting it back: > > sampleNames(flowset) <- c("unstained", "isotype", "ng2") > > pd <- pData(phenoData(flowset)) > pd["isotype"] <- c(FALSE, TRUE, FALSE) > pd["description"] <- c("Unstained", "Isotype", "NG2") > > pData(phenoData(flowset)) <- pd > varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype", "Description") > > This is quite long-winded. Is there a convenience function present > (or planned?) to add a single parameter which could update the > phenodata and varmetadata at the same time? It would ensure they > never get out of sync. > > > 6) fsApply for a single column > > There's a convenience each_col to apply a function to each column in the > flowSet, but is it possible to just access a single column, or set of > specified columns? I wrote this little helper: > > capply <- function (col, func) { > function(x) { func(x at exprs[,col]) } > } > > which can then be used like so: > > fsApply(flowset, capply("FL.1.Log", median)) > fsApply(flowset, capply("FL.1.Log", sd)) > > to get the median and sd for a single channel. However, it > doesn't work for mean, and I'm not sure why: > > fsApply(set2, capply("FL.1.Log", mean)) > Error in FUN(if (use.exprs) exprs(y) else y, ...) : > could not find function "func" > > If I call it directly, it works just fine: > > mean(flowset[[1]]@exprs[,"FL.1.Log"]) > [1] 0.4648052 > > It's not clear do me what the difference is here between the > two forms. Is there a built-in or better way to achieve the > same thing? > > > 7) Getting at filter summary data > > >> positive <- filter(flowset, pos) >> summary(positive) >> > filter summary for frame 'isotype' > positive+: 111 of 9186 events (1.21%) > > filter summary for frame 'pos' > positive+: 12070 of 13246 events (91.12%) > > Is it possible to actually get at the raw data here? i.e. > the raw event number and/or the percentages. I'm currently > doing it the "hard way", by: > > newset <- Subset(flowset, pos) > fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset, capply("FL.1.Log", length)) * 100 > > But there must be an easier and more efficient way of extracting this > information! Looking at the filter object, I can only extract the > strings as printed. > > > Many thanks for all your help, > Roger > >