Hi, Stephan
This error means that your FCS files don't all have the same channel
names on each dimension, or don't have the same number of channels.
A flowSet must consist of FCS files that have identical channels. You
can use read.FCShearer() to read in the keywords of individual files
to verify which ones don't match.
Greg Finak, PhD
Post-doctoral fellow
Fred Hutchinson Cancer Research Center
"bioconductor-request at r-project.org" <bioconductor-request at="" r-project.org=""> wrote:
Send Bioconductor mailing list submissions to
bioconductor at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at r-project.org
You can reach the person managing the list at
bioconductor-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. [flowCore] problems with read.flowSet (Stephan Fuhrmann)
2. Re: GEO and Limma (Ovokeraye Achinike-Oduaran)
3. Re: Chip-seq quality control (Lucia Peixoto)
4. Re: [question] integration of microarray data from different
sources in limma package (???)
5. Re: limma data frame subsetting problem (Jabez Wilson)
6. Re: limma data frame subsetting problem (axel.klenk at
actelion.com)
7. Re: heatmap.2 function and the scale option (Ale? Maver)
8. RankProd related question (Wendy Qiao)
9. how to deal with a 30G fastq file (wang peter)
10. Re: how to deal with a 30G fastq file (Martin Morgan)
11. Re: poe.mcmc - phenotypic label - coding (mjonczyk)
----------------------------------------------------------------------
Message: 1
Date: Wed, 5 Oct 2011 12:32:32 +0200
From: "Stephan Fuhrmann" <stephan.fuhrmann@charite.de>
To: bioconductor at r-project.org
Subject: [BioC] [flowCore] problems with read.flowSet
Message-ID:
<692b9f576d4668f47e6b8256f34430f1.squirrel at
webmail.charite.de>
Content-Type: text/plain;charset=iso-8859-1
dear all,
i tried to load 17 fcs-files using the read.flowSet-command:
flowData <- read.flowSet(path = "/Users/stephan/Desktop/FCS
files/IE-Studie/R_pp65_A", phenoData = (pData), transformation =
FALSE)
this returned the following error:
Error in dim(colNames) <- c(ncol(from[[frameList[[1]]]]),
length(frameList)) :
Dimensionen [Produkt 170] don?t match the length of the object [17]
i can?t work out whats going wrong. loading the example data works
fine.
is there anything wrong with the fcs-files?
help is much appreciated!
stephan
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] de_DE.UTF-8/de_DE.UTF-8/C/C/de_DE.UTF-8/de_DE.UTF-8
attached base packages:
[1] tools splines grid stats graphics grDevices utils
datasets methods
[10] base
other attached packages:
[1] flowUtils_1.12.0 flowQ_1.12.0 latticeExtra_0.6-18
RColorBrewer_1.0-5
[5] parody_1.10.0 bioDist_1.24.1 KernSmooth_2.23-6
outliers_0.14
[9] flowMerge_2.0.3 multicore_0.1-7 foreach_1.3.2
codetools_0.2-8
[13] iterators_1.0.5 snow_0.3-7 flowMeans_1.4.0
flowFP_1.8.0
[17] flowClust_2.10.0 ellipse_0.3-5 mnormt_1.4-3
flowViz_1.16.0
[21] lattice_0.19-33 xtable_1.5-6 sfsmisc_1.0-14
cluster_1.14.0
[25] mvoutlier_1.9.3 robCompositions_1.5.0 car_2.0-11
survival_2.36-9
[29] nnet_7.3-1 compositions_1.10-1 energy_1.3-0
MASS_7.3-14
[33] boot_1.3-2 tensorA_0.36 rgl_0.92.798
fda_2.2.7
[37] zoo_1.7-4 flowCore_1.18.0 rrcov_1.3-01
pcaPP_1.9-44
[41] mvtnorm_0.9-9991 robustbase_0.7-6 Biobase_2.12.2
flowType_0.99.4
[45] Rgraphviz_1.30.1 graph_1.30.0
loaded via a namespace (and not attached):
[1] annotate_1.30.1 AnnotationDbi_1.14.1 DBI_0.2-5
feature_1.2.8
[5] geneplotter_1.30.0 ks_1.8.2 RSQLite_0.9-4
RUnit_0.4.26
[9] stats4_2.13.1 XML_3.4-3
> traceback()
5: asMethod(object)
4: as(list2env(from, new.env(hash = T, parent = emptyenv())),
"flowSet")
3: asMethod(object)
2: as(flowSet, "flowSet")
1: read.flowSet(path = "/Users/stephan/Desktop/FCS files/IE-
Studie/R_pp65_A",
phenoData = (pData), transformation = FALSE)
------------------------------
Message: 2
Date: Wed, 5 Oct 2011 15:35:32 +0200
From: Ovokeraye Achinike-Oduaran <ovokeraye@gmail.com>
To: Sean Davis <sdavis2 at="" mail.nih.gov="">, bioconductor at
r-project.org
Subject: Re: [BioC] GEO and Limma
Message-ID:
<camnjh+kfmwf4tu+fbkearl_ehdtmrcfj3dm9maqzcm-kisd6tw at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Thanks Sean.
Avoks
On Tue, Oct 4, 2011 at 8:08 PM, Sean Davis <sdavis2 at="" mail.nih.gov="">
wrote:
> On Tue, Oct 4, 2011 at 2:04 PM, Ovokeraye Achinike-Oduaran
> <ovokeraye at="" gmail.com=""> wrote:
>> Thanks a bunch, Johannes and Sean. I'll try the GSE data again as
soon
>> as I ?get in front of a comp. About the first question on creating
a
>> model matrix ?for a 3x2 factorial dataset like gds3715, any ideas?
>
> You'll probably need a contrast matrix in there somewhere to pull
out
> contrasts of interest. ?I'm not sure which you are interested in.
>
> Sean
>
>
>> Thanks again.
>>
>> Avoks
>>
>> On 10/4/11, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote:
>>> On Tue, Oct 4, 2011 at 10:28 AM, Ovokeraye Achinike-Oduaran
>>> <ovokeraye at="" gmail.com=""> wrote:
>>>> Hi all,
>>>>
>>>> I've been struggling a bit with GEO and limma. I've come up with
2
>>>> questions that I'ld appreciate some input on.
>>>>
>>>> Question 1:
>>>> I have a design matrix with 6columns for an expression set with 3
>>>> columns(3 levels, 2 agents). How do I create a model matrix to
fit my
>>>> design matrix for this dataset (eg. gds3715 below)? For a
straight
>>>> forward single factor analysis eg. gds161, the code below seems
to
>>>> work fine.
>>>>
>>>> gds161dat = getGEO('GDS161',destdir=".")
>>>> gds161eset = GDS2eSet(gds161dat, do.log2=TRUE)
>>>> m = pData(gds161eset)$metabolism
>>>> design_gds161 = createDesignMatrix(gds161eset)
>>>> design_gds161 = model.matrix(~m)
>>>> fit = lmFit(gds161eset, design_gds161)
>>>> fit2 = eBayes(fit)
>>>> results = topTable(fit2, adjust ="BH", number = nrow(gds161eset))
>>>> excel = write.table(results, file = file.choose(), sep = ",")
>>>>
>>>> gds3715dat = getGEO('GDS3715',destdir=".")
>>>> gds3715eset = GDS2eSet(gds3715dat, do.log2=TRUE)
>>>> DIR = paste(pData(gds3715eset)$disease.state,
>>>> pData(gds3715eset)$agent, sep =".")
>>>> m = data.frame("fac1" = pData(gds3715eset)$disease.state, "fac2"
=
>>>> pData(gds3715eset)$agent)
>>>> design_gds3715 = createDesignMatrix(gds3715eset)
>>>> design_gds3715 = model.matrix(~m)
>>>> fit = lmFit(gds3715eset, design_gds3715)
>>>> fit2 = eBayes(fit)
>>>> results = topTable(fit2, adjust ="BH", number =
nrow(gds3715eset))
>>>> excel = write.table(results, file = file.choose(), sep = ",")
>>>>
>>>> Question 2:
>>>> How can I run a similar (limma) analysis with GSE files?
>>>> I can hardly figure out how to get from the expression set to the
>>>> analysis...I've read the Using the GEOquery Package documentation
>>>> several times over. I'm just have a hard time getting it to work.
>>>>
>>>> gse25724dat = getGEO('GSE25724', GSEMatrix = TRUE)
>>>
>>> Hi, Avoks. ?The return value from this call is a LIST of
>>> ExpressionSets, not a single ExpressionSet. ?You probably want the
>>> first ExpressionSet from the list and can get that:
>>>
>>> gse25724eSet = gse25724dat[[1]]
>>>
>>> GEOquery works this way because many GSEs (so-called superseries)
have
>>> multiple series included within them, so returning a list is
necessary
>>> to be general.
>>>
>>> Sean
>>>
>>>
>>>> This is supposed to give me an expressionset, I can't seem to
figure
>>>> out how to analyze it with limma.
>>>>
>>>> Please help.
>>>>
>>>> Thanks.
>>>>
>>>> Avoks
>>>>
>>>> sessionInfo()
>>>> R version 2.13.1 (2011-07-08)
>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_xxx. LC_CTYPE=English_xxx
>>>> [3] LC_MONETARY=English_xxx LC_NUMERIC=C
>>>> [5] LC_TIME=English_xxx
>>>>
>>>> attached base packages:
>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?
base
>>>>
>>>> other attached packages:
>>>> [1] puma_2.4.0 ? ? ?mclust_3.4.10 ? affy_1.30.0 ? ? limma_3.8.3
>>>> [5] GEOquery_2.19.4 Biobase_2.12.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.20.0 ? ? ? ? preprocessCore_1.14.0 RCurl_1.6-10.1
>>>> [4] tools_2.13.1 ? ? ? ? ?XML_3.4-2.2
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
------------------------------
Message: 3
Date: Wed, 5 Oct 2011 10:06:12 -0400
From: Lucia Peixoto <luciap@iscb.org>
To: Martin Morgan <mtmorgan at="" fhcrc.org="">
Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">
Subject: Re: [BioC] Chip-seq quality control
Message-ID:
<cahzs3xtso2jehxyxufkm2e+417iq_z_ohhge0hf6knu_jegyhq at="" mail.gmail.com="">
Content-Type: text/plain
Thanks very much for the suggestions
I will likely have more questions as I start the analysis
Lucia
On Tue, Oct 4, 2011 at 2:57 PM, Martin Morgan <mtmorgan at="" fhcrc.org="">
wrote:
> On 10/04/2011 07:33 AM, Ivan Gregoretti wrote:
> > Hello Lucia,
> >
> > A proper response to your post would take a lecture rather than an
> > email. I can't do that but I can bullet the main points. I think
that
> > it will help you if you are indeed a newcomer to ChIP-seq.
> >
> > 1) Expect 10 million reads per sample for a genome the size of
human.
>
> I'd run some basic QA on your lanes, via ShortRead::qa on the fastq
files
> (or bam if fastq are not available); use FastqSampler if memory is
tight
> (but in general if memory is tight the solution will be to find a
larger
> computer).
>
> See
http://bioconductor.org/help/**workflows/high-
throughput-**sequencing/<http: bioconductor.org="" help="" workflows="" high-="" throughput-sequencing=""/>for qa and perhaps other operations common to
RNAseq / ChIPseq work flows
>
>
> >
> > 2) Stick to SAM/BAM formats so that you can use well known,
publicly
> > available tools. Your best friend is called Picard.
>
> People can and do use R / Bioconductor for Picard-like tasks.
>
>
> > 3) Remove duplicates. Again, Picard is your best friend.
>
> > 4) Create WIG files for all samples, treatments and controls so
that
> > you can display them simultaneously on any genome browser.
>
> here for interactive use I would rather use basic R plotting
commands,
> avoiding the round-trip and allowing programmatic interaction.
>
>
> > 5) Find peaks with a well documented peak finder.
>
> probably a good suggestion for a one-off or common ChIP; the chipseq
> vignette
>
>
http://bioconductor.org/**packages/release/bioc/html/**chipseq.html
<http: bioconductor.org="" packages="" release="" bioc="" html="" chipseq.html="">
>
> provides inspiration for more flexible analysis; packages under the
ChIPseq
> biocViews term (Software --> AssayTechnologies ->
HighThroughputSequencing->
> **ChIPSeq) might offer a solution tailored to your ChIP.
>
>
> > 6) Compute enrichment for all treatments relative to their
controls.
>
> again the chipseq vignette is an alternative source.
>
>
> >
> > So, points 4 and 6 are your quality controls at this stage. Once
you
> > know what a good immunoprecipitation looks like compared to a bad
one,
> > you can start diving into the details. You can invent your own
quality
>
> especially at getting a sense for good versus bad results the
interactivity
> of R / Bioconductor seem essential.
>
> Martin
>
>
> > indicators. For instance, I compute the proportion of tags inside
the
> > 1000 strongest peaks. I do that for BOTH treatment and controls.
> >
> > In my workflow, Bioconductor does not get involved until I reach
point 6.
> >
> > Happy ChIPing.
> >
> > Ivan
> >
> >
> >
> >
> >
> > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at="" iscb.org="">
wrote:
> >> Hi,
> >> I am new to Chip-seq, my experiment's sequencing has finished,
and the
> read
> >> alignment is currently running
> >> The experiment was done for histone acetylation, and I have two
types
> of
> >> controls: input DNA and unmodified histone.
> >> I have two conditions and 6 biological replicates of each
condition
> >> I wanted some advice on how to perform basic quality control on
Chip-seq
> >> data using Bioconductor
> >> and also some ideas of which kinds of biases people usually
observe and
> I
> >> should keep my eyes open for
> >> any advice will be greatly appreciated!
> >> thanks
> >>
> >> Lucia
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________**_________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >>
https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor="">
> >> Search the archives:
http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor="">
> >>
> >
> > ______________________________**_________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> >
https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor="">
> > Search the archives:
http://news.gmane.org/gmane.**
> science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor="">
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
[[alternative HTML version deleted]]
------------------------------
Message: 4
Date: Wed, 5 Oct 2011 23:50:01 +0900
From: ??? <sjlee@biosoft.kaist.ac.kr>
To: "Freudenberg, Johannes (NIH/NIEHS) [E]"
<johannes.freudenberg at="" nih.gov="">, "bioconductor at
stat.math.ethz.ch"
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] [question] integration of microarray data from
different sources in limma package
Message-ID: <4E8C6E99.5020703 at biosoft.kaist.ac.kr>
Content-Type: text/plain; charset="EUC-KR"
Hi Johannes,
I would like to explain my data in detail for your deep understanding.
There are three replicate data from two-channel microarray
Two of them were scanned by genepix scanner and the other was scanned
by
agilent scanner.
All of them are originated from same microarray platform( agilent chip
). Just different scanner makes them provide different outputs, but
its
original information is same.
However, when I want to use three microarray data for the analysis
using
limma package, it is difficult to use built-in functions in the
package
to normalize and analyze them for extracting DEGs(differentially
expressed genes).
For example,
library(limma)
setwd("~~")
targets_agilent = readTargets("targets_agilent.txt")
targets_genepix = readTargets("targets_genepix.txt")
rg_agilent = read.maimages(targets_agilent$FileName, source="agilent")
rg_genepix = read.maimages(targets_genepix$FileName, source="genepix")
rg_all = cbind( rg_agilent, rg_genepix )
When I tried to use rg_all object for further analysis, it provokes
some
problems because cbind function just concetanates the data without
consideration of interal information.
Please let me know your precious opinion for this beginner in this
field.
Thanks in advance.
Sincerely,
Sunjae LEE
2011-10-01 ?? 2:29, Freudenberg, Johannes (NIH/NIEHS) [E] ? ?:
> Hi Sunjae,
>
>> Is there any possibility to combine, normalize and analyze them
without problems?
> I think the order should be "normalize, combine, and analyze
(hopefully) without problems." I don't really see a way (nor a good
reason) for combining the different platforms first and then doing the
pre-processing.
>
> Best regards,
> --Johannes
>
>
>
> -----Original Message-----
> From: ??? [mailto:sjlee at biosoft.kaist.ac.kr]
> Sent: Tuesday, September 27, 2011 11:05 PM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] [question] integration of microarray data from
different sources in limma package
>
> Dear developers,
>
> I want to ask some questions about microarray data pre-processing
>
> In the limma package, it supports reading microarray data from
different source such as agilent(agilent feature extraction),
genepix(genepix pro) format by fuction read.maimages().
>
> And cbind fuction enables us to combine RGList obtained from
micorarray data from different source
>
> However, when we try to normalize the combined RGList data through
normalizeBetweenArrays function and analyze differentially expressed
gene through lmFit, eBayes function, it produces some problems of not-
compatible integrations due to diffrenence of its sources. ( I just
want to integrate microarray data from agilent and genepix format as
soon as possible )
>
> Is there any possibility to combine, normalize and analyze them
without problems?
>
> please let me know how to solve them.
>
>
>
> Sincerely,
>
> Sunjae LEE
>
> Ph.D. candidate, BISL, Dept. of Bio and Brain engineering, KAIST
335-Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Sunjae LEE
Ph.D. candidate, BISL, Dept. of Bio and Brain engineering, KAIST
335-Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea
TEL +82-42-350-4353
(cell.) +82-10-3090-9050
------------------------------
Message: 5
Date: Wed, 5 Oct 2011 15:22:14 +0100 (BST)
From: Jabez Wilson <jabezwuk@yahoo.co.uk>
To: axel.klenk at actelion.com
Cc: bioconductor at r-project.org, bioconductor-bounces at
r-project.org
Subject: Re: [BioC] limma data frame subsetting problem
Message-ID:
<1317824534.79133.YahooMailClassic at
web28515.mail.ukl.yahoo.com>
Content-Type: text/plain
Yes, thanks, Axel, I think that may well do the trick - assuming you
meant rowMaxs() ;-)
You're right, it is an RGList not a DF.
Jab
--- On Wed, 5/10/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com=""> wrote:
From: axel.klenk@actelion.com <axel.klenk@actelion.com>
Subject: Re: [BioC] limma data frame subsetting problem
Cc: bioconductor at r-project.org, bioconductor-bounces at
r-project.org
Date: Wednesday, 5 October, 2011, 10:42
Dear Jabez,
try:
RG[rowMax(RG$G) > 5000, ]
Is that what you want?
And BTW, note that you are dealing with an RGList and not a data
frame...
Cheers,
- axel
Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil /
Switzerland
From:
To:
bioconductor at r-project.org
Date:
05.10.2011 11:15
Subject:
[BioC] limma data frame subsetting problem
Sent by:
bioconductor-bounces at r-project.org
Dear Bioconductors, I want to do something simple, which I cannot find
the
solution to. I have a limma data frame and I want to select a subset
of
the data frame based on whether the values in the "G" channel are >
e.g.
5000
As an example I use swirl data
targets <- readTargets("SwirlSample.txt"); RG <-
read.maimages(targets,
source="spot")
> head(RG$G)
swirl.1 swirl.2 swirl.3 swirl.4
[1,] 22028.260 19278.770 2727.5600 19930.6500
[2,] 25613.200 21438.960 2787.0330 25426.5800
[3,] 22652.390 20386.470 2419.8810 16225.9500
[4,] 8929.286 6677.619 383.2381 786.9048
[5,] 8746.476 6576.292 901.0000 468.0476
[6,] 37010.080 23769.100 23377.9700 28399.0900
I can select the first 20 lines of the df by
>RG[1:20,]
but I really want to select those lines of the df (and keep the df
format
intact) where the "G" value in any column is > a certain figure (e.g.
5000)
However,
> RG[RG$G>5000,]
Error in `[.RGList`(RG, RG$G > 5000, ) :
(subscript) logical subscript too long
I have no success with subset either:
> subset(RG, RG$G>5000,)
Error: Two subscripts required
> subset(RG$G, RG$G>5000)
Error in subset.matrix(RG$G, RG$G > 5000) :
(subscript) logical subscript too long
Do I need to write a loop to check each column of "G" seperately? Or
is
there a simpler solution?
TIA
Jabez
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
The information of this email and in any file transmitted with it is
strictly confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended
recipient, any copying, distribution or any other use of this email is
prohibited and may be unlawful. In such case, you should please notify
the sender immediately and destroy this email.
The content of this email is not legally binding unless confirmed by
letter.
Any views expressed in this message are those of the individual
sender, except where the message states otherwise and the sender is
authorised to state them to be the views of the sender's company. For
further information about Actelion please see our website at
http://www.actelion.com
[[alternative HTML version deleted]]
------------------------------
Message: 6
Date: Wed, 5 Oct 2011 17:06:24 +0200
From: axel.klenk@actelion.com
Cc: bioconductor at r-project.org, bioconductor-bounces at
r-project.org
Subject: Re: [BioC] limma data frame subsetting problem
Message-ID:
<1002_1317827251_4E8C72B3_1002_4917_1_OF930BDEB2.27423C34-ONC1
257920.00511A52-C1257920.005315A3 at actelion.com>
Content-Type: text/plain; charset="US-ASCII"
No, I meant rowMax() from package Biobase but obviously this
functionality
has been implemented several times... rowMaxs() from fBasics (if
that's
the
one you've found) is less efficient but it will hardly make a
difference
for your
use case - as in fortune(98)... :-)
- axel
Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil /
Switzerland
From:
To:
axel.klenk at actelion.com
Cc:
bioconductor at r-project.org, bioconductor-bounces at r-project.org
Date:
05.10.2011 16:22
Subject:
Re: [BioC] limma data frame subsetting problem
Yes, thanks, Axel, I think that may well do the trick - assuming you
meant
rowMaxs() ;-)
You're right, it is an RGList not a DF.
Jab
--- On Wed, 5/10/11, axel.klenk at actelion.com <axel.klenk at="" actelion.com="">
wrote:
From: axel.klenk@actelion.com <axel.klenk@actelion.com>
Subject: Re: [BioC] limma data frame subsetting problem
Cc: bioconductor at r-project.org, bioconductor-bounces at
r-project.org
Date: Wednesday, 5 October, 2011, 10:42
Dear Jabez,
try:
RG[rowMax(RG$G) > 5000, ]
Is that what you want?
And BTW, note that you are dealing with an RGList and not a data
frame...
Cheers,
- axel
Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil /
Switzerland
From:
To:
bioconductor at r-project.org
Date:
05.10.2011 11:15
Subject:
[BioC] limma data frame subsetting problem
Sent by:
bioconductor-bounces at r-project.org
Dear Bioconductors, I want to do something simple, which I cannot find
the
solution to. I have a limma data frame and I want to select a subset
of
the data frame based on whether the values in the "G" channel are >
e.g.
5000
As an example I use swirl data
targets <- readTargets("SwirlSample.txt"); RG <-
read.maimages(targets,
source="spot")
> head(RG$G)
swirl.1 swirl.2 swirl.3 swirl.4
[1,] 22028.260 19278.770 2727.5600 19930.6500
[2,] 25613.200 21438.960 2787.0330 25426.5800
[3,] 22652.390 20386.470 2419.8810 16225.9500
[4,] 8929.286 6677.619 383.2381 786.9048
[5,] 8746.476 6576.292 901.0000 468.0476
[6,] 37010.080 23769.100 23377.9700 28399.0900
I can select the first 20 lines of the df by
>RG[1:20,]
but I really want to select those lines of the df (and keep the df
format
intact) where the "G" value in any column is > a certain figure (e.g.
5000)
However,
> RG[RG$G>5000,]
Error in `[.RGList`(RG, RG$G > 5000, ) :
(subscript) logical subscript too long
I have no success with subset either:
> subset(RG, RG$G>5000,)
Error: Two subscripts required
> subset(RG$G, RG$G>5000)
Error in subset.matrix(RG$G, RG$G > 5000) :
(subscript) logical subscript too long
Do I need to write a loop to check each column of "G" seperately? Or
is
there a simpler solution?
TIA
Jabez
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
The information of this email and in any file transmitted with it is
strictly confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended
recipient, any copying, distribution or any other use of this email is
prohibited and may be unlawful. In such case, you should please notify
the
sender immediately and destroy this email.
The content of this email is not legally binding unless confirmed by
letter.
Any views expressed in this message are those of the individual
sender,
except where the message states otherwise and the sender is authorised
to
state them to be the views of the sender's company. For further
information about Actelion please see our website at
http://www.actelion.com
The information of this email and in any file transmitted with it is
strictly confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended
recipient, any copying, distribution or any other use of this email is
prohibited and may be unlawful. In such case, you should please notify
the sender immediately and destroy this email.
The content of this email is not legally binding unless confirmed by
letter.
Any views expressed in this message are those of the individual
sender, except where the message states otherwise and the sender is
authorised to state them to be the views of the sender's company. For
further information about Actelion please see our website at
http://www.actelion.com
------------------------------
Message: 7
Date: Wed, 5 Oct 2011 19:33:36 +0200
From: Ale? Maver <ales.maver@gmail.com>
To: john herbert <arraystruggles at="" gmail.com="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] heatmap.2 function and the scale option
Message-ID:
<cahzx6mccnq6ehg=vz9hfqh=yhsqwksjtwdrabmzrfejc1mgyiw at="" mail.gmail.com="">
Content-Type: text/plain
Hi!
I think it depends on what information you have stored in rows and
what in
columns. It also depends what data you have stored in the matrix.
For example, if your input values are deltaCt values (you mentioned
you work
on RT-PCR results), they may differ substantially across genes, which
may
cause your heatmap to look anomalous.
If you look at an exemplar matrix with dCt values for three genes in
rows
(Gene1-Gene3) and ten samples in columns:
mat<-matrix(c(rnorm(n=10, mean=5, sd=1),
rnorm(n=10, mean=6, sd=1),
rnorm(n=10, mean=15, sd=1)), ncol=10, byrow=T,
dimnames=list(paste("Gene", c(1:3), sep=""), paste("Sample", c(1:10),
sep="")))
Here the distributions of measured values differ substantially between
genes
(means 5,10 and 15). If you inspect the plot where scaling across
columns is
used:
heatmap(mat, scale="column")
you will notice Gene3 to be always more yellow in color than Gene1 or
Gene2,
as Gene3 values are always more than other two.
However, using scaling across rows brings out the differences between
samples - separately for each gene (row), as value range is
calculated for
each gene (row) seperately:
heatmap(mat, scale="row")
Hope this explains it,
Ales
2011/10/3 john herbert <arraystruggles at="" gmail.com="">
> Dear Bioconductors,
> I am using the the heatmap.2 function to plot out a heatmap of 130
> genes from Q-PCR expression.
>
> If I plot them out scaling by row, I get a heatmap that looks a bit
> blotchy.
>
> However, if I scale by column, the heatmap looks a lot better.
>
> The clustering of samples is the same but I don't really know the
> theory behind why column scaling looks better than row scaling.
>
> Please can someone advise me on any theory behind this?
>
> The data is in a matrix, like microarray, with columns of different
> conditions and rows of genes.
>
> Thank you,
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Ales Maver, MD
Institute of Medical Genetics, Department of Obstetrics and
Gynaecology
UMC Ljubljana
?lajmerjeva 3
SI-1000 Ljubljana
Slovenia
[[alternative HTML version deleted]]
------------------------------
Message: 8
Date: Wed, 5 Oct 2011 23:00:33 -0400
From: Wendy Qiao <wendy2.qiao@gmail.com>
To: bioconductor at r-project.org
Subject: [BioC] RankProd related question
Message-ID:
<cah=jfhygyjwswamvaptdotahwc3pcpbdhxljl+plgvcbb5m39q at="" mail.gmail.com="">
Content-Type: text/plain
Hi all,
I need to analyze a combined dataset of microarrays from two sources.
The
arrays for the first cell type is from source1 and the arrays for the
other
cell types are from source2. I would like to use the advanced rank
product
method to identify the unregulated genes. My "origin" vector and
sample
vector are as following. However, by using the following code, I got
an
error message as highlighted. I think that the arrays for cell type 1
must
from two sources. Does it mean that I cannot use the advanced
RankProd? Does
anyone know the major difference between the normal and advanced
RankProd
methods?
Thank you in advance,
Wendy
mergeD.origin # origin vector
[1] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[54] 2 2 2 2 2 2 2 2
mergeD.cl # sample vector
[1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[54] 0 0 0 0 0 0 0 0
RP.adv.out <- RPadvance(mergeD, mergeD.cl, mergeD.origin, num.perm =
100,
logged = FALSE, gene.names = rownames(mergeD), rand = 123)
The data is from 2 different origins
Rank Product analysis for two-class case
Error in OriginxyCall(data, cl, origin) :
Error: data from different origins should contain data from both
classes
[[alternative HTML version deleted]]
------------------------------
Message: 9
Date: Wed, 5 Oct 2011 22:04:45 -0500
From: wang peter <wng.peter@gmail.com>
To: bioconductor at r-project.org
Subject: [BioC] how to deal with a 30G fastq file
Message-ID:
<cakh7rfcat40fe9pxffnhgov0zos+si_2_d_tygvs+tpm1tme8a at="" mail.gmail.com="">
Content-Type: text/plain
IT is too slow to read them in the memory.
who can tell me if i need split them by other program or
call some R function to split them
thx
[[alternative HTML version deleted]]
------------------------------
Message: 10
Date: Wed, 05 Oct 2011 20:39:12 -0700
From: Martin Morgan <mtmorgan@fhcrc.org>
To: wang peter <wng.peter at="" gmail.com="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] how to deal with a 30G fastq file
Message-ID: <4E8D22E0.5010607 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 10/05/2011 08:04 PM, wang peter wrote:
> IT is too slow to read them in the memory.
> who can tell me if i need split them by other program or
> call some R function to split them
ShortRead::FastqSampler streams the entire file but returns a subset
(often faster than reading in the whole data) ShortRead::FastqStreamer
(in development) iterates over the file --
fq = FastqStreamer(<...>)
while (length(res <- yield(fq)))
# work, e.g., filter
A cheap hack is to force R to allocate a large amount of memory and
then
to run the operation
replicate(10, raw(1e9)) ## that's alot
dna = readFastq(...)
The 'withIds=FALSE' argument to readFastq can save a lot of time if
ids
are not necessary.
If the records are all 4 lines long it is very easy to split a file
(untested code; the Linux pros would use awk for efficient processing;
check out StackOverflow / Biostar)
fl = file("foo.fastq", "r")
idx = 0;
while (isIncomplete(fl)) {
recs = readLines(fl, n=1000000)
writeLines(sprintf("fout-%d.fastq", idx), recs)
idx <- idx + 1
}
once split on Linux / Mac use library(multicore) or library(parallel)
(R-2.14 or later) and
mclapply(seq_len(idx), function(i) {
fq = readFastq(sprintf("fout-%d.fastq", idx))
## work, then...
TRUE
})
to process in parallel (it doesn't make sense to try to read them in
parallel and try to return them back to a 'master').
Martin
>
> thx
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
------------------------------
Message: 11
Date: Thu, 06 Oct 2011 08:19:50 +0200
From: mjonczyk <mjonczyk@biol.uw.edu.pl>
To: Debashis Ghosh <ghoshd at="" psu.edu="">
Cc: bioconductor at r-project.org
Subject: Re: [BioC] poe.mcmc - phenotypic label - coding
Message-ID: <1a80ce9d9686c922b1cca5ccaf0dd604 at biol.uw.edu.pl>
Content-Type: text/plain; charset=UTF-8; format=flowed
Dear Dr. Ghosh,
thank you for the clarification.
Best Wishes,
Maciej
---
Maciej Jonczyk,
Department of Plant Molecular Ecophysiology
Institute of Plant Experimental Biology
Faculty of Biology, University of Warsaw
02-096 Warszawa, Miecznikowa 1
Poland
On Wed, 5 Oct 2011 22:22:08 -0400, Debashis Ghosh wrote:
> Dear Dr. Jonczyk,
>
> It's an error in the text. Hyungwon should be revising the vignette
> in future versions of metaArray.
>
> Debashis
>
> On Tue, Oct 4, 2011 at 8:09 AM, mjonczyk <mjonczyk at="" biol.uw.edu.pl="">
> wrote:
>> Dear List,
>>
>> I'd like to make sure I understand coding of phenotypic label in
>> poe.mcmc function of metaArray correctly.
>>
>> Vector of my arrays' treatments could be presented as:
>>
>> (ctrl, ctrl, ctrl, trt1, trt1, trt1) so as NN value I used vector
>> (1,
>> 1, 1, 0, 0, 0).
>>
>> Documentation for either poe.mcmc, poe.em and fit.em say that "1"
>> means
>> that e=0 (baseline expression),
>> and "0" is for other conditions.
>>
>> Although this three documentation files suggests the same coding,
>> fit.em states that:
>> "
>> cl: A vector of 0s and 1s. Use 1 for normal phenotype and 0 for
>> ? ? ? ? ? non-normal phenotype. Note that
>> *this is the opposite of POE MCMC.*
>> If all samples are of unknown phenotype or of the same
>> ? ? ? ? ? one, give vector of zeros. When class information is
>> ? ? ? ? ? provided, conditional estimation of the mixture is
>> applied.
>> "
>> So is it simply an error in text or should I code my variants
>> differently?
>>
>> Best Wishes!
>>
>> --
>> Maciej Jonczyk,
>> Department of Plant Molecular Ecophysiology
>> Institute of Plant Experimental Biology
>> Faculty of Biology, University of Warsaw
>> 02-096 Warszawa, Miecznikowa 1
>> Poland
>>
>>
>>
>> --
>> This email was Anti Virus checked by Astaro Security Gateway.
>>
http://www.astaro.com
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 104, Issue 6