[flowCore] problems with read.flowSet

0

Entering edit mode

Stephan Fuhrmann ▴ 20

@stephan-fuhrmann-4872

Last seen 10.6 years ago

dear all, i tried to load 17 fcs-files using the read.flowSet-command: flowData <- read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE-Studie/R_pp65_A", phenoData = (pData), transformation = FALSE) this returned the following error: Error in dim(colNames) <- c(ncol(from[[frameList[[1]]]]), length(frameList)) : Dimensionen [Produkt 170] don?t match the length of the object [17] i can?t work out whats going wrong. loading the example data works fine. is there anything wrong with the fcs-files? help is much appreciated! stephan > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] de_DE.UTF-8/de_DE.UTF-8/C/C/de_DE.UTF-8/de_DE.UTF-8 attached base packages: [1] tools splines grid stats graphics grDevices utils datasets methods [10] base other attached packages: [1] flowUtils_1.12.0 flowQ_1.12.0 latticeExtra_0.6-18 RColorBrewer_1.0-5 [5] parody_1.10.0 bioDist_1.24.1 KernSmooth_2.23-6 outliers_0.14 [9] flowMerge_2.0.3 multicore_0.1-7 foreach_1.3.2 codetools_0.2-8 [13] iterators_1.0.5 snow_0.3-7 flowMeans_1.4.0 flowFP_1.8.0 [17] flowClust_2.10.0 ellipse_0.3-5 mnormt_1.4-3 flowViz_1.16.0 [21] lattice_0.19-33 xtable_1.5-6 sfsmisc_1.0-14 cluster_1.14.0 [25] mvoutlier_1.9.3 robCompositions_1.5.0 car_2.0-11 survival_2.36-9 [29] nnet_7.3-1 compositions_1.10-1 energy_1.3-0 MASS_7.3-14 [33] boot_1.3-2 tensorA_0.36 rgl_0.92.798 fda_2.2.7 [37] zoo_1.7-4 flowCore_1.18.0 rrcov_1.3-01 pcaPP_1.9-44 [41] mvtnorm_0.9-9991 robustbase_0.7-6 Biobase_2.12.2 flowType_0.99.4 [45] Rgraphviz_1.30.1 graph_1.30.0 loaded via a namespace (and not attached): [1] annotate_1.30.1 AnnotationDbi_1.14.1 DBI_0.2-5 feature_1.2.8 [5] geneplotter_1.30.0 ks_1.8.2 RSQLite_0.9-4 RUnit_0.4.26 [9] stats4_2.13.1 XML_3.4-3 > traceback() 5: asMethod(object) 4: as(list2env(from, new.env(hash = T, parent = emptyenv())), "flowSet") 3: asMethod(object) 2: as(flowSet, "flowSet") 1: read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE- Studie/R_pp65_A", phenoData = (pData), transformation = FALSE)

• 2.3k views

ADD COMMENT • link updated 13.5 years ago by Jiang, Mike ★ 1.3k • written 13.5 years ago by Stephan Fuhrmann ▴ 20

0

Entering edit mode

Greg Finak ▴ 240

@greg-finak-4299

Last seen 8.2 years ago

United States

Hi, Stephan This error means that your FCS files don't all have the same channel names on each dimension, or don't have the same number of channels. A flowSet must consist of FCS files that have identical channels. You can use read.FCShearer() to read in the keywords of individual files to verify which ones don't match. Greg Finak, PhD Post-doctoral fellow Fred Hutchinson Cancer Research Center "bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote: Send Bioconductor mailing list submissions to bioconductor at r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/bioconductor or, via email, send a message with subject or body 'help' to bioconductor-request at r-project.org You can reach the person managing the list at bioconductor-owner at r-project.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioconductor digest..." Today's Topics: 1. [flowCore] problems with read.flowSet (Stephan Fuhrmann) 2. Re: GEO and Limma (Ovokeraye Achinike-Oduaran) 3. Re: Chip-seq quality control (Lucia Peixoto) 4. Re: [question] integration of microarray data from different sources in limma package (???) 5. Re: limma data frame subsetting problem (Jabez Wilson) 6. Re: limma data frame subsetting problem (axel.klenk at actelion.com) 7. Re: heatmap.2 function and the scale option (Ale? Maver) 8. RankProd related question (Wendy Qiao) 9. how to deal with a 30G fastq file (wang peter) 10. Re: how to deal with a 30G fastq file (Martin Morgan) 11. Re: poe.mcmc - phenotypic label - coding (mjonczyk) ---------------------------------------------------------------------- Message: 1 Date: Wed, 5 Oct 2011 12:32:32 +0200 From: "Stephan Fuhrmann" <stephan.fuhrmann@charite.de> To: bioconductor at r-project.org Subject: [BioC] [flowCore] problems with read.flowSet Message-ID: <692b9f576d4668f47e6b8256f34430f1.squirrel at webmail.charite.de> Content-Type: text/plain;charset=iso-8859-1 dear all, i tried to load 17 fcs-files using the read.flowSet-command: flowData <- read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE-Studie/R_pp65_A", phenoData = (pData), transformation = FALSE) this returned the following error: Error in dim(colNames) <- c(ncol(from[[frameList[[1]]]]), length(frameList)) : Dimensionen [Produkt 170] don?t match the length of the object [17] i can?t work out whats going wrong. loading the example data works fine. is there anything wrong with the fcs-files? help is much appreciated! stephan > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] de_DE.UTF-8/de_DE.UTF-8/C/C/de_DE.UTF-8/de_DE.UTF-8 attached base packages: [1] tools splines grid stats graphics grDevices utils datasets methods [10] base other attached packages: [1] flowUtils_1.12.0 flowQ_1.12.0 latticeExtra_0.6-18 RColorBrewer_1.0-5 [5] parody_1.10.0 bioDist_1.24.1 KernSmooth_2.23-6 outliers_0.14 [9] flowMerge_2.0.3 multicore_0.1-7 foreach_1.3.2 codetools_0.2-8 [13] iterators_1.0.5 snow_0.3-7 flowMeans_1.4.0 flowFP_1.8.0 [17] flowClust_2.10.0 ellipse_0.3-5 mnormt_1.4-3 flowViz_1.16.0 [21] lattice_0.19-33 xtable_1.5-6 sfsmisc_1.0-14 cluster_1.14.0 [25] mvoutlier_1.9.3 robCompositions_1.5.0 car_2.0-11 survival_2.36-9 [29] nnet_7.3-1 compositions_1.10-1 energy_1.3-0 MASS_7.3-14 [33] boot_1.3-2 tensorA_0.36 rgl_0.92.798 fda_2.2.7 [37] zoo_1.7-4 flowCore_1.18.0 rrcov_1.3-01 pcaPP_1.9-44 [41] mvtnorm_0.9-9991 robustbase_0.7-6 Biobase_2.12.2 flowType_0.99.4 [45] Rgraphviz_1.30.1 graph_1.30.0 loaded via a namespace (and not attached): [1] annotate_1.30.1 AnnotationDbi_1.14.1 DBI_0.2-5 feature_1.2.8 [5] geneplotter_1.30.0 ks_1.8.2 RSQLite_0.9-4 RUnit_0.4.26 [9] stats4_2.13.1 XML_3.4-3 > traceback() 5: asMethod(object) 4: as(list2env(from, new.env(hash = T, parent = emptyenv())), "flowSet") 3: asMethod(object) 2: as(flowSet, "flowSet") 1: read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE- Studie/R_pp65_A", phenoData = (pData), transformation = FALSE) ------------------------------ Message: 2 Date: Wed, 5 Oct 2011 15:35:32 +0200 From: Ovokeraye Achinike-Oduaran <ovokeraye@gmail.com> To: Sean Davis <sdavis2 at="" mail.nih.gov="">, bioconductor at r-project.org Subject: Re: [BioC] GEO and Limma Message-ID: <camnjh+kfmwf4tu+fbkearl_ehdtmrcfj3dm9maqzcm-kisd6tw at="" mail.gmail.com=""> Content-Type: text/plain; charset=ISO-8859-1 Thanks Sean. Avoks On Tue, Oct 4, 2011 at 8:08 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > On Tue, Oct 4, 2011 at 2:04 PM, Ovokeraye Achinike-Oduaran > <ovokeraye at="" gmail.com=""> wrote: >> Thanks a bunch, Johannes and Sean. I'll try the GSE data again as soon >> as I ?get in front of a comp. About the first question on creating a >> model matrix ?for a 3x2 factorial dataset like gds3715, any ideas? > > You'll probably need a contrast matrix in there somewhere to pull out > contrasts of interest. ?I'm not sure which you are interested in. > > Sean > > >> Thanks again. >> >> Avoks >> >> On 10/4/11, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >>> On Tue, Oct 4, 2011 at 10:28 AM, Ovokeraye Achinike-Oduaran >>> <ovokeraye at="" gmail.com=""> wrote: >>>> Hi all, >>>> >>>> I've been struggling a bit with GEO and limma. I've come up with 2 >>>> questions that I'ld appreciate some input on. >>>> >>>> Question 1: >>>> I have a design matrix with 6columns for an expression set with 3 >>>> columns(3 levels, 2 agents). How do I create a model matrix to fit my >>>> design matrix for this dataset (eg. gds3715 below)? For a straight >>>> forward single factor analysis eg. gds161, the code below seems to >>>> work fine. >>>> >>>> gds161dat = getGEO('GDS161',destdir=".") >>>> gds161eset = GDS2eSet(gds161dat, do.log2=TRUE) >>>> m = pData(gds161eset)$metabolism >>>> design_gds161 = createDesignMatrix(gds161eset) >>>> design_gds161 = model.matrix(~m) >>>> fit = lmFit(gds161eset, design_gds161) >>>> fit2 = eBayes(fit) >>>> results = topTable(fit2, adjust ="BH", number = nrow(gds161eset)) >>>> excel = write.table(results, file = file.choose(), sep = ",") >>>> >>>> gds3715dat = getGEO('GDS3715',destdir=".") >>>> gds3715eset = GDS2eSet(gds3715dat, do.log2=TRUE) >>>> DIR = paste(pData(gds3715eset)$disease.state, >>>> pData(gds3715eset)$agent, sep =".") >>>> m = data.frame("fac1" = pData(gds3715eset)$disease.state, "fac2" = >>>> pData(gds3715eset)$agent) >>>> design_gds3715 = createDesignMatrix(gds3715eset) >>>> design_gds3715 = model.matrix(~m) >>>> fit = lmFit(gds3715eset, design_gds3715) >>>> fit2 = eBayes(fit) >>>> results = topTable(fit2, adjust ="BH", number = nrow(gds3715eset)) >>>> excel = write.table(results, file = file.choose(), sep = ",") >>>> >>>> Question 2: >>>> How can I run a similar (limma) analysis with GSE files? >>>> I can hardly figure out how to get from the expression set to the >>>> analysis...I've read the Using the GEOquery Package documentation >>>> several times over. I'm just have a hard time getting it to work. >>>> >>>> gse25724dat = getGEO('GSE25724', GSEMatrix = TRUE) >>> >>> Hi, Avoks. ?The return value from this call is a LIST of >>> ExpressionSets, not a single ExpressionSet. ?You probably want the >>> first ExpressionSet from the list and can get that: >>> >>> gse25724eSet = gse25724dat[[1]] >>> >>> GEOquery works this way because many GSEs (so-called superseries) have >>> multiple series included within them, so returning a list is necessary >>> to be general. >>> >>> Sean >>> >>> >>>> This is supposed to give me an expressionset, I can't seem to figure >>>> out how to analyze it with limma. >>>> >>>> Please help. >>>> >>>> Thanks. >>>> >>>> Avoks >>>> >>>> sessionInfo() >>>> R version 2.13.1 (2011-07-08) >>>> Platform: i386-pc-mingw32/i386 (32-bit) >>>> >>>> locale: >>>> [1] LC_COLLATE=English_xxx. LC_CTYPE=English_xxx >>>> [3] LC_MONETARY=English_xxx LC_NUMERIC=C >>>> [5] LC_TIME=English_xxx >>>> >>>> attached base packages: >>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>> >>>> other attached packages: >>>> [1] puma_2.4.0 ? ? ?mclust_3.4.10 ? affy_1.30.0 ? ? limma_3.8.3 >>>> [5] GEOquery_2.19.4 Biobase_2.12.2 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] affyio_1.20.0 ? ? ? ? preprocessCore_1.14.0 RCurl_1.6-10.1 >>>> [4] tools_2.13.1 ? ? ? ? ?XML_3.4-2.2 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > ------------------------------ Message: 3 Date: Wed, 5 Oct 2011 10:06:12 -0400 From: Lucia Peixoto <luciap@iscb.org> To: Martin Morgan <mtmorgan at="" fhcrc.org=""> Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: [BioC] Chip-seq quality control Message-ID: <cahzs3xtso2jehxyxufkm2e+417iq_z_ohhge0hf6knu_jegyhq at="" mail.gmail.com=""> Content-Type: text/plain Thanks very much for the suggestions I will likely have more questions as I start the analysis Lucia On Tue, Oct 4, 2011 at 2:57 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 10/04/2011 07:33 AM, Ivan Gregoretti wrote: > > Hello Lucia, > > > > A proper response to your post would take a lecture rather than an > > email. I can't do that but I can bullet the main points. I think that > > it will help you if you are indeed a newcomer to ChIP-seq. > > > > 1) Expect 10 million reads per sample for a genome the size of human. > > I'd run some basic QA on your lanes, via ShortRead::qa on the fastq files > (or bam if fastq are not available); use FastqSampler if memory is tight > (but in general if memory is tight the solution will be to find a larger > computer). > > See http://bioconductor.org/help/**workflows/high- throughput-**sequencing/<http: bioconductor.org="" help="" workflows="" high-="" throughput-sequencing=""/>for qa and perhaps other operations common to RNAseq / ChIPseq work flows > > > > > > 2) Stick to SAM/BAM formats so that you can use well known, publicly > > available tools. Your best friend is called Picard. > > People can and do use R / Bioconductor for Picard-like tasks. > > > > 3) Remove duplicates. Again, Picard is your best friend. > > > 4) Create WIG files for all samples, treatments and controls so that > > you can display them simultaneously on any genome browser. > > here for interactive use I would rather use basic R plotting commands, > avoiding the round-trip and allowing programmatic interaction. > > > > 5) Find peaks with a well documented peak finder. > > probably a good suggestion for a one-off or common ChIP; the chipseq > vignette > > http://bioconductor.org/**packages/release/bioc/html/**chipseq.html <http: bioconductor.org="" packages="" release="" bioc="" html="" chipseq.html=""> > > provides inspiration for more flexible analysis; packages under the ChIPseq > biocViews term (Software --> AssayTechnologies -> HighThroughputSequencing-> > **ChIPSeq) might offer a solution tailored to your ChIP. > > > > 6) Compute enrichment for all treatments relative to their controls. > > again the chipseq vignette is an alternative source. > > > > > > So, points 4 and 6 are your quality controls at this stage. Once you > > know what a good immunoprecipitation looks like compared to a bad one, > > you can start diving into the details. You can invent your own quality > > especially at getting a sense for good versus bad results the interactivity > of R / Bioconductor seem essential. > > Martin > > > > indicators. For instance, I compute the proportion of tags inside the > > 1000 strongest peaks. I do that for BOTH treatment and controls. > > > > In my workflow, Bioconductor does not get involved until I reach point 6. > > > > Happy ChIPing. > > > > Ivan > > > > > > > > > > > > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at="" iscb.org=""> wrote: > >> Hi, > >> I am new to Chip-seq, my experiment's sequencing has finished, and the > read > >> alignment is currently running > >> The experiment was done for histone acetylation, and I have two types > of > >> controls: input DNA and unmodified histone. > >> I have two conditions and 6 biological replicates of each condition > >> I wanted some advice on how to perform basic quality control on Chip-seq > >> data using Bioconductor > >> and also some ideas of which kinds of biases people usually observe and > I > >> should keep my eyes open for > >> any advice will be greatly appreciated! > >> thanks > >> > >> Lucia > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________**_________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> > >> Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > >> > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> > > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]] ------------------------------ Message: 4 Date: Wed, 5 Oct 2011 23:50:01 +0900 From: ??? <sjlee@biosoft.kaist.ac.kr> To: "Freudenberg, Johannes (NIH/NIEHS) [E]" <johannes.freudenberg at="" nih.gov="">, "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] [question] integration of microarray data from different sources in limma package Message-ID: <4E8C6E99.5020703 at biosoft.kaist.ac.kr> Content-Type: text/plain; charset="EUC-KR" Hi Johannes, I would like to explain my data in detail for your deep understanding. There are three replicate data from two-channel microarray Two of them were scanned by genepix scanner and the other was scanned by agilent scanner. All of them are originated from same microarray platform( agilent chip ). Just different scanner makes them provide different outputs, but its original information is same. However, when I want to use three microarray data for the analysis using limma package, it is difficult to use built-in functions in the package to normalize and analyze them for extracting DEGs(differentially expressed genes). For example, library(limma) setwd("~~") targets_agilent = readTargets("targets_agilent.txt") targets_genepix = readTargets("targets_genepix.txt") rg_agilent = read.maimages(targets_agilent$FileName, source="agilent") rg_genepix = read.maimages(targets_genepix$FileName, source="genepix") rg_all = cbind( rg_agilent, rg_genepix ) When I tried to use rg_all object for further analysis, it provokes some problems because cbind function just concetanates the data without consideration of interal information. Please let me know your precious opinion for this beginner in this field. Thanks in advance. Sincerely, Sunjae LEE 2011-10-01 ?? 2:29, Freudenberg, Johannes (NIH/NIEHS) [E] ? ?: > Hi Sunjae, > >> Is there any possibility to combine, normalize and analyze them without problems? > I think the order should be "normalize, combine, and analyze (hopefully) without problems." I don't really see a way (nor a good reason) for combining the different platforms first and then doing the pre-processing. > > Best regards, > --Johannes > > > > -----Original Message----- > From: ??? [mailto:sjlee at biosoft.kaist.ac.kr] > Sent: Tuesday, September 27, 2011 11:05 PM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] [question] integration of microarray data from different sources in limma package > > Dear developers, > > I want to ask some questions about microarray data pre-processing > > In the limma package, it supports reading microarray data from different source such as agilent(agilent feature extraction), genepix(genepix pro) format by fuction read.maimages(). > > And cbind fuction enables us to combine RGList obtained from micorarray data from different source > > However, when we try to normalize the combined RGList data through normalizeBetweenArrays function and analyze differentially expressed gene through lmFit, eBayes function, it produces some problems of not- compatible integrations due to diffrenence of its sources. ( I just want to integrate microarray data from agilent and genepix format as soon as possible ) > > Is there any possibility to combine, normalize and analyze them without problems? > > please let me know how to solve them. > > > > Sincerely, > > Sunjae LEE > > Ph.D. candidate, BISL, Dept. of Bio and Brain engineering, KAIST 335-Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Sunjae LEE Ph.D. candidate, BISL, Dept. of Bio and Brain engineering, KAIST 335-Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea TEL +82-42-350-4353 (cell.) +82-10-3090-9050 ------------------------------ Message: 5 Date: Wed, 5 Oct 2011 15:22:14 +0100 (BST) From: Jabez Wilson <jabezwuk@yahoo.co.uk> To: axel.klenk at actelion.com Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Subject: Re: [BioC] limma data frame subsetting problem Message-ID: <1317824534.79133.YahooMailClassic at web28515.mail.ukl.yahoo.com> Content-Type: text/plain Yes, thanks, Axel, I think that may well do the trick - assuming you meant rowMaxs() ;-) You're right, it is an RGList not a DF. Jab --- On Wed, 5/10/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> wrote: From: axel.klenk@actelion.com <axel.klenk@actelion.com> Subject: Re: [BioC] limma data frame subsetting problem Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Date: Wednesday, 5 October, 2011, 10:42 Dear Jabez, try: RG[rowMax(RG$G) > 5000, ] Is that what you want? And BTW, note that you are dealing with an RGList and not a data frame... Cheers, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: To: bioconductor at r-project.org Date: 05.10.2011 11:15 Subject: [BioC] limma data frame subsetting problem Sent by: bioconductor-bounces at r-project.org Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com [[alternative HTML version deleted]] ------------------------------ Message: 6 Date: Wed, 5 Oct 2011 17:06:24 +0200 From: axel.klenk@actelion.com Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Subject: Re: [BioC] limma data frame subsetting problem Message-ID: <1002_1317827251_4E8C72B3_1002_4917_1_OF930BDEB2.27423C34-ONC1 257920.00511A52-C1257920.005315A3 at actelion.com> Content-Type: text/plain; charset="US-ASCII" No, I meant rowMax() from package Biobase but obviously this functionality has been implemented several times... rowMaxs() from fBasics (if that's the one you've found) is less efficient but it will hardly make a difference for your use case - as in fortune(98)... :-) - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: To: axel.klenk at actelion.com Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Date: 05.10.2011 16:22 Subject: Re: [BioC] limma data frame subsetting problem Yes, thanks, Axel, I think that may well do the trick - assuming you meant rowMaxs() ;-) You're right, it is an RGList not a DF. Jab --- On Wed, 5/10/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> wrote: From: axel.klenk@actelion.com <axel.klenk@actelion.com> Subject: Re: [BioC] limma data frame subsetting problem Cc: bioconductor at r-project.org, bioconductor-bounces at r-project.org Date: Wednesday, 5 October, 2011, 10:42 Dear Jabez, try: RG[rowMax(RG$G) > 5000, ] Is that what you want? And BTW, note that you are dealing with an RGList and not a data frame... Cheers, - axel Axel Klenk Research Informatician Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / Switzerland From: To: bioconductor at r-project.org Date: 05.10.2011 11:15 Subject: [BioC] limma data frame subsetting problem Sent by: bioconductor-bounces at r-project.org Dear Bioconductors, I want to do something simple, which I cannot find the solution to. I have a limma data frame and I want to select a subset of the data frame based on whether the values in the "G" channel are > e.g. 5000 As an example I use swirl data targets <- readTargets("SwirlSample.txt"); RG <- read.maimages(targets, source="spot") > head(RG$G) swirl.1 swirl.2 swirl.3 swirl.4 [1,] 22028.260 19278.770 2727.5600 19930.6500 [2,] 25613.200 21438.960 2787.0330 25426.5800 [3,] 22652.390 20386.470 2419.8810 16225.9500 [4,] 8929.286 6677.619 383.2381 786.9048 [5,] 8746.476 6576.292 901.0000 468.0476 [6,] 37010.080 23769.100 23377.9700 28399.0900 I can select the first 20 lines of the df by >RG[1:20,] but I really want to select those lines of the df (and keep the df format intact) where the "G" value in any column is > a certain figure (e.g. 5000) However, > RG[RG$G>5000,] Error in `[.RGList`(RG, RG$G > 5000, ) : (subscript) logical subscript too long I have no success with subset either: > subset(RG, RG$G>5000,) Error: Two subscripts required > subset(RG$G, RG$G>5000) Error in subset.matrix(RG$G, RG$G > 5000) : (subscript) logical subscript too long Do I need to write a loop to check each column of "G" seperately? Or is there a simpler solution? TIA Jabez [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email. The content of this email is not legally binding unless confirmed by letter. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com ------------------------------ Message: 7 Date: Wed, 5 Oct 2011 19:33:36 +0200 From: Ale? Maver <ales.maver@gmail.com> To: john herbert <arraystruggles at="" gmail.com=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] heatmap.2 function and the scale option Message-ID: <cahzx6mccnq6ehg=vz9hfqh=yhsqwksjtwdrabmzrfejc1mgyiw at="" mail.gmail.com=""> Content-Type: text/plain Hi! I think it depends on what information you have stored in rows and what in columns. It also depends what data you have stored in the matrix. For example, if your input values are deltaCt values (you mentioned you work on RT-PCR results), they may differ substantially across genes, which may cause your heatmap to look anomalous. If you look at an exemplar matrix with dCt values for three genes in rows (Gene1-Gene3) and ten samples in columns: mat<-matrix(c(rnorm(n=10, mean=5, sd=1), rnorm(n=10, mean=6, sd=1), rnorm(n=10, mean=15, sd=1)), ncol=10, byrow=T, dimnames=list(paste("Gene", c(1:3), sep=""), paste("Sample", c(1:10), sep=""))) Here the distributions of measured values differ substantially between genes (means 5,10 and 15). If you inspect the plot where scaling across columns is used: heatmap(mat, scale="column") you will notice Gene3 to be always more yellow in color than Gene1 or Gene2, as Gene3 values are always more than other two. However, using scaling across rows brings out the differences between samples - separately for each gene (row), as value range is calculated for each gene (row) seperately: heatmap(mat, scale="row") Hope this explains it, Ales 2011/10/3 john herbert <arraystruggles at="" gmail.com=""> > Dear Bioconductors, > I am using the the heatmap.2 function to plot out a heatmap of 130 > genes from Q-PCR expression. > > If I plot them out scaling by row, I get a heatmap that looks a bit > blotchy. > > However, if I scale by column, the heatmap looks a lot better. > > The clustering of samples is the same but I don't really know the > theory behind why column scaling looks better than row scaling. > > Please can someone advise me on any theory behind this? > > The data is in a matrix, like microarray, with columns of different > conditions and rows of genes. > > Thank you, > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Ales Maver, MD Institute of Medical Genetics, Department of Obstetrics and Gynaecology UMC Ljubljana ?lajmerjeva 3 SI-1000 Ljubljana Slovenia [[alternative HTML version deleted]] ------------------------------ Message: 8 Date: Wed, 5 Oct 2011 23:00:33 -0400 From: Wendy Qiao <wendy2.qiao@gmail.com> To: bioconductor at r-project.org Subject: [BioC] RankProd related question Message-ID: <cah=jfhygyjwswamvaptdotahwc3pcpbdhxljl+plgvcbb5m39q at="" mail.gmail.com=""> Content-Type: text/plain Hi all, I need to analyze a combined dataset of microarrays from two sources. The arrays for the first cell type is from source1 and the arrays for the other cell types are from source2. I would like to use the advanced rank product method to identify the unregulated genes. My "origin" vector and sample vector are as following. However, by using the following code, I got an error message as highlighted. I think that the arrays for cell type 1 must from two sources. Does it mean that I cannot use the advanced RankProd? Does anyone know the major difference between the normal and advanced RankProd methods? Thank you in advance, Wendy mergeD.origin # origin vector [1] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [54] 2 2 2 2 2 2 2 2 mergeD.cl # sample vector [1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [54] 0 0 0 0 0 0 0 0 RP.adv.out <- RPadvance(mergeD, mergeD.cl, mergeD.origin, num.perm = 100, logged = FALSE, gene.names = rownames(mergeD), rand = 123) The data is from 2 different origins Rank Product analysis for two-class case Error in OriginxyCall(data, cl, origin) : Error: data from different origins should contain data from both classes [[alternative HTML version deleted]] ------------------------------ Message: 9 Date: Wed, 5 Oct 2011 22:04:45 -0500 From: wang peter <wng.peter@gmail.com> To: bioconductor at r-project.org Subject: [BioC] how to deal with a 30G fastq file Message-ID: <cakh7rfcat40fe9pxffnhgov0zos+si_2_d_tygvs+tpm1tme8a at="" mail.gmail.com=""> Content-Type: text/plain IT is too slow to read them in the memory. who can tell me if i need split them by other program or call some R function to split them thx [[alternative HTML version deleted]] ------------------------------ Message: 10 Date: Wed, 05 Oct 2011 20:39:12 -0700 From: Martin Morgan <mtmorgan@fhcrc.org> To: wang peter <wng.peter at="" gmail.com=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] how to deal with a 30G fastq file Message-ID: <4E8D22E0.5010607 at fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 10/05/2011 08:04 PM, wang peter wrote: > IT is too slow to read them in the memory. > who can tell me if i need split them by other program or > call some R function to split them ShortRead::FastqSampler streams the entire file but returns a subset (often faster than reading in the whole data) ShortRead::FastqStreamer (in development) iterates over the file -- fq = FastqStreamer(<...>) while (length(res <- yield(fq))) # work, e.g., filter A cheap hack is to force R to allocate a large amount of memory and then to run the operation replicate(10, raw(1e9)) ## that's alot dna = readFastq(...) The 'withIds=FALSE' argument to readFastq can save a lot of time if ids are not necessary. If the records are all 4 lines long it is very easy to split a file (untested code; the Linux pros would use awk for efficient processing; check out StackOverflow / Biostar) fl = file("foo.fastq", "r") idx = 0; while (isIncomplete(fl)) { recs = readLines(fl, n=1000000) writeLines(sprintf("fout-%d.fastq", idx), recs) idx <- idx + 1 } once split on Linux / Mac use library(multicore) or library(parallel) (R-2.14 or later) and mclapply(seq_len(idx), function(i) { fq = readFastq(sprintf("fout-%d.fastq", idx)) ## work, then... TRUE }) to process in parallel (it doesn't make sense to try to read them in parallel and try to return them back to a 'master'). Martin > > thx > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 ------------------------------ Message: 11 Date: Thu, 06 Oct 2011 08:19:50 +0200 From: mjonczyk <mjonczyk@biol.uw.edu.pl> To: Debashis Ghosh <ghoshd at="" psu.edu=""> Cc: bioconductor at r-project.org Subject: Re: [BioC] poe.mcmc - phenotypic label - coding Message-ID: <1a80ce9d9686c922b1cca5ccaf0dd604 at biol.uw.edu.pl> Content-Type: text/plain; charset=UTF-8; format=flowed Dear Dr. Ghosh, thank you for the clarification. Best Wishes, Maciej --- Maciej Jonczyk, Department of Plant Molecular Ecophysiology Institute of Plant Experimental Biology Faculty of Biology, University of Warsaw 02-096 Warszawa, Miecznikowa 1 Poland On Wed, 5 Oct 2011 22:22:08 -0400, Debashis Ghosh wrote: > Dear Dr. Jonczyk, > > It's an error in the text. Hyungwon should be revising the vignette > in future versions of metaArray. > > Debashis > > On Tue, Oct 4, 2011 at 8:09 AM, mjonczyk <mjonczyk at="" biol.uw.edu.pl=""> > wrote: >> Dear List, >> >> I'd like to make sure I understand coding of phenotypic label in >> poe.mcmc function of metaArray correctly. >> >> Vector of my arrays' treatments could be presented as: >> >> (ctrl, ctrl, ctrl, trt1, trt1, trt1) so as NN value I used vector >> (1, >> 1, 1, 0, 0, 0). >> >> Documentation for either poe.mcmc, poe.em and fit.em say that "1" >> means >> that e=0 (baseline expression), >> and "0" is for other conditions. >> >> Although this three documentation files suggests the same coding, >> fit.em states that: >> " >> cl: A vector of 0s and 1s. Use 1 for normal phenotype and 0 for >> ? ? ? ? ? non-normal phenotype. Note that >> *this is the opposite of POE MCMC.* >> If all samples are of unknown phenotype or of the same >> ? ? ? ? ? one, give vector of zeros. When class information is >> ? ? ? ? ? provided, conditional estimation of the mixture is >> applied. >> " >> So is it simply an error in text or should I code my variants >> differently? >> >> Best Wishes! >> >> -- >> Maciej Jonczyk, >> Department of Plant Molecular Ecophysiology >> Institute of Plant Experimental Biology >> Faculty of Biology, University of Warsaw >> 02-096 Warszawa, Miecznikowa 1 >> Poland >> >> >> >> -- >> This email was Anti Virus checked by Astaro Security Gateway. >> http://www.astaro.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> ------------------------------ _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor End of Bioconductor Digest, Vol 104, Issue 6

ADD COMMENT • link 13.5 years ago Greg Finak ▴ 240

0

Entering edit mode

Jiang, Mike ★ 1.3k

@jiang-mike-4886

Last seen 3.5 years ago

(Private Address)

Have you tried loading without "phenoData = (pData)" to make sure that it is not caused by pData supplied? ~Mike -----Original Message----- From: Stephan Fuhrmann [mailto:stephan.fuhrmann@charite.de] Sent: Wed 10/5/2011 3:32 AM To: bioconductor@r-project.org Subject: [BioC] [flowCore] problems with read.flowSet dear all, i tried to load 17 fcs-files using the read.flowSet-command: flowData <- read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE-Studie/R_pp65_A", phenoData = (pData), transformation = FALSE) this returned the following error: Error in dim(colNames) <- c(ncol(from[[frameList[[1]]]]), length(frameList)) : Dimensionen [Produkt 170] don´t match the length of the object [17] i can´t work out whats going wrong. loading the example data works fine. is there anything wrong with the fcs-files? help is much appreciated! stephan > sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] de_DE.UTF-8/de_DE.UTF-8/C/C/de_DE.UTF-8/de_DE.UTF-8 attached base packages: [1] tools splines grid stats graphics grDevices utils datasets methods [10] base other attached packages: [1] flowUtils_1.12.0 flowQ_1.12.0 latticeExtra_0.6-18 RColorBrewer_1.0-5 [5] parody_1.10.0 bioDist_1.24.1 KernSmooth_2.23-6 outliers_0.14 [9] flowMerge_2.0.3 multicore_0.1-7 foreach_1.3.2 codetools_0.2-8 [13] iterators_1.0.5 snow_0.3-7 flowMeans_1.4.0 flowFP_1.8.0 [17] flowClust_2.10.0 ellipse_0.3-5 mnormt_1.4-3 flowViz_1.16.0 [21] lattice_0.19-33 xtable_1.5-6 sfsmisc_1.0-14 cluster_1.14.0 [25] mvoutlier_1.9.3 robCompositions_1.5.0 car_2.0-11 survival_2.36-9 [29] nnet_7.3-1 compositions_1.10-1 energy_1.3-0 MASS_7.3-14 [33] boot_1.3-2 tensorA_0.36 rgl_0.92.798 fda_2.2.7 [37] zoo_1.7-4 flowCore_1.18.0 rrcov_1.3-01 pcaPP_1.9-44 [41] mvtnorm_0.9-9991 robustbase_0.7-6 Biobase_2.12.2 flowType_0.99.4 [45] Rgraphviz_1.30.1 graph_1.30.0 loaded via a namespace (and not attached): [1] annotate_1.30.1 AnnotationDbi_1.14.1 DBI_0.2-5 feature_1.2.8 [5] geneplotter_1.30.0 ks_1.8.2 RSQLite_0.9-4 RUnit_0.4.26 [9] stats4_2.13.1 XML_3.4-3 > traceback() 5: asMethod(object) 4: as(list2env(from, new.env(hash = T, parent = emptyenv())), "flowSet") 3: asMethod(object) 2: as(flowSet, "flowSet") 1: read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE- Studie/R_pp65_A", phenoData = (pData), transformation = FALSE) [[alternative HTML version deleted]]

ADD COMMENT • link 13.5 years ago Jiang, Mike ★ 1.3k

Login before adding your answer.