Hi Karl,
The only way I know to rotate the labels is pretty crude. You will
have to reconstitute the labels using the text() function.
The caveat here is you'll have to play around to get this right.
Try something like this:
Library(gplots)
x <- matrix(rnorm(25), 5)
heatmap.2(x, labRow="", labCol="") #remove the labels
# plot the text, perhaps someone can think of a smarter way of getting
the labels in position...
text(seq(par("xaxp")[1]+par("xaxp")[2]/par("xaxp")[3], par("xaxp")[2],
by=0.8*(par("xaxp")[2]/par("xaxp")[3])),par("usr")[3], par("usr")[3] -
0.2, labels = c("first", "second", "third", "fourth", "fifth"), srt =
45, pos = 1, xpd = TRUE)
Unfortunatetly the heatmap is laid out in a 2x2 matrix with the
dendrograms and key in the first 3 cells and the heatmap in the bottom
right -- I'm not sure if it is possible to access the axes of this
element independently. If one could then it might make positioning the
labels for the heatmap moiety of the plot simple.
Amos
-----Original Message-----
From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-
bounces@stat.math.ethz.ch] On Behalf Of bioconductor-
request@stat.math.ethz.ch
Sent: 23 July 2010 11:00
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 89, Issue 22
Send Bioconductor mailing list submissions to
bioconductor at stat.math.ethz.ch
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
bioconductor-request at stat.math.ethz.ch
You can reach the person managing the list at
bioconductor-owner at stat.math.ethz.ch
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."
Today's Topics:
1. heatmap.2 - change column & row locations; angle / rotate
(Karl Brand)
2. In limma, how to set quility weight for each spot. (Jinyan
Huang)
3. Re: In limma, how to set quility weight for each spot.
(Sean Davis)
4. Re: exonmap/xmapcore error (Crispin Miller)
5. Heatmap.2 scale problems: Sacling inside the function gives
different results than scaling outside!!! (Elmer Fern?ndez)
6. Re: exonmap/xmapcore error (Crispin Miller)
7. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Sean Davis)
8. ShortRead QA (Alex Gutteridge)
9. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Bazeley, Peter)
10. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Benjamin Otto)
11. Biostrings - vcountPattern optimization (Erik Wright)
12. Re: Biostrings - vcountPattern optimization (Steve Lianoglou)
13. problem about hgu133plus2 annotation (Gina Liao)
14. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Elmer
Fern?ndez)
15. Re: problem about hgu133plus2 annotation (Marc Carlson)
16. Re: problem about hgu133plus2 annotation (James W. MacDonald)
17. Re: Biostrings - vcountPattern optimization (Patrick Aboyoun)
18. Re: feature request - pairwiseAlignment() in Biostrings
(Patrick Aboyoun)
19. Re: Biostrings - vcountPattern optimization (Erik Wright)
20. Re: feature request - pairwiseAlignment() in Biostrings
(Michael Lawrence)
21. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Steve
Lianoglou)
22. Re: Biostrings - vcountPattern optimization (Hervé Pagès)
23. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Elmer
Fern?ndez)
24. Re: Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!! (Sean Davis)
25. the design matrix again (Gordon K Smyth)
26. Open Postdoc Positions (Thomas Girke)
27. Re: htQPCR (Heidi Dvinge)
28. Re: Problem with function limmaCtData in HTqPCR package:
"leading minor of order 2 is not positive definite" (Heidi
Dvinge)
29. building a refseq-based transcriptDb: warnings of interest?
(Vincent Carey)
----------------------------------------------------------------------
Message: 1
Date: Thu, 22 Jul 2010 12:18:16 +0200
From: Karl Brand <k.brand@erasmusmc.nl>
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] heatmap.2 - change column & row locations; angle /
rotate
Message-ID: <4C481AE8.7060701 at erasmusmc.nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
<reposting from="" "r-help="" at="" r-project.org"="">
Esteemed BioC user's,
I'm struggling to achieve some details of a heatmap using heatmap.2():
1. Change label locations, for both rows & columns from the default
right & bottom, to left and top.
Can this be done within heatmap.2()? Or do i need to suppress this
default behavior (how) and call a new function to relabel (what)
specifying locations?
2. Change the angle of the labels.
By default column labels are 90deg anti-clock-wise from horizontal.
How
to bring them back to horizontal? Or better, rotate 45deg clock-wise
from horizontal (ie., rotate 135deg a.clock.wise from default)?
Any suggestions or pointers to helpful resources greatly appreciated,
Karl
--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268
------------------------------
Message: 2
Date: Thu, 22 Jul 2010 13:39:46 +0200
From: Jinyan Huang <jhuang.ceph@gmail.com>
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] In limma, how to set quility weight for each spot.
Message-ID:
<aanlktilvavdqrbcp-lbfa8pct7sut2vguxmov5l4dzun at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Hi all,
My data is from GoldenGate Methylation Cancer Panel I. For each spot,
there are a p-value for quility. I want to use limma to analysis the
data. How can I set the quility weight for each spot? From the manual
of limma, it can be set by read.maimages. But my data is not import by
read.maimages.
Thanks.
------------------------------
Message: 3
Date: Thu, 22 Jul 2010 06:02:28 -0600
From: Sean Davis <sdavis2@mail.nih.gov>
To: Jinyan Huang <jhuang.ceph at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] In limma, how to set quility weight for each spot.
Message-ID:
<aanlktin2pna5terltx53tlqiw0za5rzqlnltekidc8hd at="" mail.gmail.com="">
Content-Type: text/plain
On Thu, Jul 22, 2010 at 5:39 AM, Jinyan Huang <jhuang.ceph at="" gmail.com=""> wrote:
> Hi all,
> My data is from GoldenGate Methylation Cancer Panel I. For each
spot,
> there are a p-value for quility. I want to use limma to analysis the
> data. How can I set the quility weight for each spot? From the
manual
> of limma, it can be set by read.maimages. But my data is not import
by
> read.maimages.
>
>
Hi, Jinyan. You'll want to read the help for lmFit().
Sean
[[alternative HTML version deleted]]
------------------------------
Message: 4
Date: Thu, 22 Jul 2010 13:58:04 +0100
From: "Crispin Miller" <cmiller@picr.man.ac.uk>
To: "Bioconductor" <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] exonmap/xmapcore error
Message-ID: <c86dfeec.cc8d%cmiller at="" picr.man.ac.uk="">
Content-Type: text/plain
Dear Anupam,
Since we published exonmap, we've released a newer package, xmapcore.
This
focuses on the core database connectivity and has a significant amount
of
work done behind the API to make certain bits of it much much quicker.
We'll
put a note in the exonmap vignette to point people to the new package,
since
it's obviously causing a bit of confusion.
One thing that xmapcore does is use a smaller database that's been
optimised
for some of the queries that were slower in exonmap than we would have
liked
- this also means that you no longer have to install Ensembl - the
xmapcore
database, on it's own, will do the job.
Have a look at the documentation for the xmapcore package (especially
INSTALL.pdf) that provides step-by-step installation instructions.
As we mention in the exonmap vignette, there were some basic utility
functions to help people load and begin to explore exon array data. As
you'll see from the vignette, we've not duplicated these in xmapcore.
Crispin
On 20/07/2010 17:00, "anupam sinha" <anupam.contact at="" gmail.com="">
wrote:
> Dear all,
> I have been learning to use exonmap/xmapcore from
the
> tutorial ""Comprehensive analysis of Affymetrix Exon arrays Using
> BioConductor" .
> But I have run into some problems. I have installed
> "xmapcore_homo_sapiens_58" on my system as per instructions .
> Do I also have to install ensemble and old exonmap databases? Can
> someone help me out ? Thanks in advance for any suggestions.
>
>
>> > library(xmapcore)
>> > library(exonmap)
> Loading required package: affy
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>
> Attaching package: 'Biobase'
>
> The following object(s) are masked from 'package:IRanges':
>
> updateObject
>
> Loading required package: genefilter
> Loading required package: RColorBrewer
>
> Attaching package: 'exonmap'
>
> The following object(s) are masked from 'package:xmapcore':
>
> exon.details, exon.to.gene, exon.to.probeset,
exon.to.transcript,
> exonic, exons.in.range, gene.details, gene.to.exon,
> gene.to.probeset, gene.to.transcript, genes.in.range,
intergenic,
> intronic, is.exonic, is.intergenic, is.intronic,
probes.in.range,
> probeset.to.exon, probeset.to.gene, probeset.to.probe,
> probeset.to.transcript, probesets.in.range, symbol.to.gene,
> transcript.details, transcript.to.exon, transcript.to.gene,
> transcript.to.probeset, transcripts.in.range
>
>
>> >
setwd("/home/aragorn/R_Workspace/ExonarraysMCF7andMCF10Adata_cel/")
>> > raw.data<-read.exon()
>> > raw.data at cdfName<-"exon.pmcdf"
>> > x.rma<-rma(raw.data)
> Background correcting
> Normalizing
> Calculating Expression
>> > pc.rma<-pc(x.rma,"group",c("a","b"))
>> > keep<-(abs(fc(pc.rma))>1)&tt(pc.rma)< 1e-4
>> > sigs<-featureNames(x.rma)[keep]
>> > xmapConnect()
> Select a database to connect to:
>
> 1: Hman ('xmapcore_homo_sapiens_58')
>
> Selection: 1
> password:
> Warning message:
> In .xmap.load.config() :
> Environment 'R_XMAP_CONF_DIR' not set. Please refer to INSTALL.TXT
for
> information on how to set this up.
>
> Trying '.exonmap'.
>
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > xmapConnect()
> Select a database to connect to:
>
> 1: Hman ('xmapcore_homo_sapiens_58')
>
> Selection: 1
>
>> > probeset.to.exon(sigs[1:5])
> Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)
>
>> > xmap.connect()
> password:
> Disconnecting from xmapcore_homo_sapiens_58 (localhost)
> Connected to xmapcore_homo_sapiens_58 (localhost)
> Selected array 'HuEx-1_0' as a default.
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] exon.pmcdf_1.1 exonmap_2.6.0 RColorBrewer_1.0-2
> genefilter_1.30.0
> [5] affy_1.26.1 Biobase_2.8.0 xmapcore_1.2.5
> digest_0.4.2
> [9] IRanges_1.6.8 RMySQL_0.7-4 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 annotate_1.26.1
AnnotationDbi_1.10.2
> [4] preprocessCore_1.10.0 RSQLite_0.9-1 splines_2.11.0
> [7] survival_2.35-8 tcltk_2.11.0 tools_2.11.0
> [10] xtable_1.5-6
>
> Regards,
>
> Anupam
> --
> Graduate Student,
> Center For DNA Fingerprinting And Diagnostics,
> 4-1-714 to 725/2, Tuljaguda complex
> Mozamzahi Road, Nampally,
> Hyderabad-500001
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--------------------------------------------------------
This email is confidential and intended solely for the
u...{{dropped:15}}
------------------------------
Message: 5
Date: Thu, 22 Jul 2010 10:05:39 -0300
From: Elmer Fern?ndez <elmerfer@gmail.com>
To: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] Heatmap.2 scale problems: Sacling inside the function
gives different results than scaling outside!!!
Message-ID:
<aanlktilqksufwajtt9skcscav0dutqie7il2mmwxqdyp at="" mail.gmail.com="">
Content-Type: text/plain
Dear Users
I'm working with the heatmap.2 function and I realize that if you use
the
scale input paramenter gives different results than usign the scale
function
outsie and feed the heatmap.2 fucntion with the scaled matrix. I
attached
the results of the two approaches and the used data matrix (M.csv).
SO, what I'm doing wrong?
R Code
library(gplots)
M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
heatmap.2(M,scale="column",trace="none",main="scaled inside")
x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
outside")
> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets
methods
base
other attached packages:
[1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
gtools_2.6.1 rkward_0.5.1
loaded via a namespace (and not attached):
[1] tools_2.10.0
--
Elmer A. Fern?ndez (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
--
Elmer A. Fern?ndez (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
[[alternative HTML version deleted]]
------------------------------
Message: 6
Date: Thu, 22 Jul 2010 14:09:55 +0100
From: "Crispin Miller" <cmiller@picr.man.ac.uk>
To: "Bioconductor" <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] exonmap/xmapcore error
Message-ID: <c86e01b3.cc91%cmiller at="" picr.man.ac.uk="">
Content-Type: text/plain
Hi Paul,
Hopefully it's simpler now - with xmapcore, you need to install just
the
xmapcore database into a working MySQL instance (and the package
itself, of
course).
There's also a pretty detailed walk through in the INSTALL.pdf
document that
forms part of the xmapcore package.
Crispin
>
> Yeah originally, they did a pretty poor job at describing how to do
> that, it was the largest impediment to otherwise using a very nice
> package. They threw you to the wolves by pointing to a section that
> describes how to entire the whole ensemble DB and web interface. I
> notice they have the new xmapcore database , are those the ones you
are
> using?:
>
>
http://xmap.picr.man.ac.uk/download/index#hsxmapcore
>
> I have NOT used those
>
> but at least in the beginning of the year , You only need SQL to
> install ,you do not need to install ensemble , just the "core" data
> base.
> As I recall you need to go into the SQl and get create the database
> then you need to run the script that makes the tables.
> Then these are filled (but a second script, cat's recall)
>
> my notes indicate I also inatall exon.pmcdf: (in above web link)
> R CMD INSTALL --clean exon.pmcdf_1.1.tar.gz
>
>
>
> you may need to run something like this on the command line first to
> start the service:
>
> mysql -h host_computer -u xmap -pPassword ## where the
host_compueter is
> where the db is and Password is the password)
>
> then in R
>
> xmapConnect("human")
>
>
> ##################
> In my home directory there is a .exnmap file with:
> a file database.txt attached
>
> and a subfolder db.local that has
> a file starts.core.homo_sapiens_core_56_37a.R a larget 3.7Mb file
>
> and in bashrc:
> export XMAP_BRIDGE_CACHE=/home/pleo/.xmb_cache
> #######
>
> I think now with the new core database you might be better off using
> documentation in the latest exonmap or xmapcore libraries than that
original
> manuscript. They have made some changes.
>
> Hope that helps
> Paul
>
>
>
> -----Original Message-----
> From: anupam sinha <anupam.contact at="" gmail.com="">
> To: bioc <bioconductor at="" stat.math.ethz.ch="">
> Subject: [BioC] exonmap/xmapcore error
> Date: Tue, 20 Jul 2010 21:30:24 +0530
>
>
> Dear all,
> I have been learning to use exonmap/xmapcore from
the
> tutorial ""Comprehensive analysis of Affymetrix Exon arrays Using
> BioConductor" .
> But I have run into some problems. I have installed
> "xmapcore_homo_sapiens_58" on my system as per instructions .
> Do I also have to install ensemble and old exonmap databases? Can
> someone help me out ? Thanks in advance for any suggestions.
>
>
>> > library(xmapcore)
>> > library(exonmap)
> Loading required package: affy
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>
> Attaching package: 'Biobase'
>
> The following object(s) are masked from 'package:IRanges':
>
> updateObject
>
> Loading required package: genefilter
> Loading required package: RColorBrewer
>
> Attaching package: 'exonmap'
>
> The following object(s) are masked from 'package:xmapcore':
>
> exon.details, exon.to.gene, exon.to.probeset,
exon.to.transcript,
> exonic, exons.in.range, gene.details, gene.to.exon,
> gene.to.probeset, gene.to.transcript, genes.in.range,
intergenic,
> intronic, is.exonic, is.intergenic, is.intronic,
probes.in.range,
> probeset.to.exon, probeset.to.gene, probeset.to.probe,
> probeset.to.transcript, probesets.in.range, symbol.to.gene,
> transcript.details, transcript.to.exon, transcript.to.gene,
> transcript.to.probeset, transcripts.in.range
>
>
>> >
setwd("/home/aragorn/R_Workspace/ExonarraysMCF7andMCF10Adata_cel/")
>> > raw.data<-read.exon()
>> > raw.data at cdfName<-"exon.pmcdf"
>> > x.rma<-rma(raw.data)
> Background correcting
> Normalizing
> Calculating Expression
>> > pc.rma<-pc(x.rma,"group",c("a","b"))
>> > keep<-(abs(fc(pc.rma))>1)&tt(pc.rma)< 1e-4
>> > sigs<-featureNames(x.rma)[keep]
>> > xmapConnect()
> Select a database to connect to:
>
> 1: Hman ('xmapcore_homo_sapiens_58')
>
> Selection: 1
> password:
> Warning message:
> In .xmap.load.config() :
> Environment 'R_XMAP_CONF_DIR' not set. Please refer to INSTALL.TXT
for
> information on how to set this up.
>
> Trying '.exonmap'.
>
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > xmapConnect()
> Select a database to connect to:
>
> 1: Hman ('xmapcore_homo_sapiens_58')
>
> Selection: 1
>
>> > probeset.to.exon(sigs[1:5])
> Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)
>
>> > xmap.connect()
> password:
> Disconnecting from xmapcore_homo_sapiens_58 (localhost)
> Connected to xmapcore_homo_sapiens_58 (localhost)
> Selected array 'HuEx-1_0' as a default.
>> > probeset.to.exon(sigs[1:5])
> *Error in mysqlExecStatement(conn, statement, ...) :
> RS-DBI driver: (could not run statement: PROCEDURE
> xmapcore_homo_sapiens_58.xmap_probesetToExon does not exist)*
>> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-redhat-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] exon.pmcdf_1.1 exonmap_2.6.0 RColorBrewer_1.0-2
> genefilter_1.30.0
> [5] affy_1.26.1 Biobase_2.8.0 xmapcore_1.2.5
> digest_0.4.2
> [9] IRanges_1.6.8 RMySQL_0.7-4 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] affyio_1.16.0 annotate_1.26.1
AnnotationDbi_1.10.2
> [4] preprocessCore_1.10.0 RSQLite_0.9-1 splines_2.11.0
> [7] survival_2.35-8 tcltk_2.11.0 tools_2.11.0
> [10] xtable_1.5-6
>
> Regards,
>
> Anupam
> --
> Graduate Student,
> Center For DNA Fingerprinting And Diagnostics,
> 4-1-714 to 725/2, Tuljaguda complex
> Mozamzahi Road, Nampally,
> Hyderabad-500001
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--------------------------------------------------------
This email is confidential and intended solely for the
u...{{dropped:15}}
------------------------------
Message: 7
Date: Thu, 22 Jul 2010 08:17:21 -0600
From: Sean Davis <sdavis2@mail.nih.gov>
To: Elmer Fern?ndez <elmerfer at="" gmail.com="">
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<aanlktimzp4hrxsuyyokxjgrs7ajwfhgvg1nyrnfazpyd at="" mail.gmail.com="">
Content-Type: text/plain
2010/7/22 Elmer Fern??ndez <elmerfer at="" gmail.com="">
> Dear Users
> I'm working with the heatmap.2 function and I realize that if you
use the
> scale input paramenter gives different results than usign the scale
> function
> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
attached
> the results of the two approaches and the used data matrix (M.csv).
> SO, what I'm doing wrong?
>
>
Hi, Elmer.
The default distance function used by heatmap.2 is dist() which is not
going
to be invariant under centering and scaling, I don't think. It looks
like
you are using that default.
Sean
> R Code
>
> library(gplots)
> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
outside")
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
> gtools_2.6.1 rkward_0.5.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
>
> --
> Elmer A. Fern??ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C??rdoba-5017-Argentina
>
>
>
> --
> Elmer A. Fern??ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C??rdoba-5017-Argentina
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 8
Date: Thu, 22 Jul 2010 15:26:21 +0100
From: Alex Gutteridge <alexg@ruggedtextile.com>
To: <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] ShortRead QA
Message-ID: <da36088b4e3acb477e837c2e970fd5a9 at="" ruggedtextile.com="">
Content-Type: text/plain; charset=UTF-8
I'm dealing with some Solexa/Illumina data with ShortRead for the
first
time and had a couple of questions relating to QA:
1. Memory requirements: My data comprises 7 s_N_export.txt files. Each
one
comprises 10-20 million aligned reads. If I try to run qa() over the
whole
directory my machine rapidly grinds to a halt. Tackling each file
individually keeps my machine running, but takes >1 hour for each one.
The
ShortRead vignette says evaluating a single lane can take 'several
minutes', so I'm wondering if anyone can offer any clues as to why I'm
struggling so much? The machine in question has 6GB of RAM - do I just
need
more?
2. Read distribution: The QA results I'm getting for the 'read
distribution' section don't quite look like those presented in the
example
ShortRead Solexa QA report. My interpretation is that this is because
my
data is actually rather high quality, but I'd appreciate a second
opinion.
To quote from the ShortRead QA report:
'Ideally, the cumulative proportion of reads will transition sharply
from
low to high. Portions to the left of the transition might correspond
roughly to sequencing or sample processing errors, and correspond to
reads
that are represented relatively infrequently [...]. Portions to the
right
of the transition represent reads that are over-represented compared
to
expectation.'
Typically the read distribution plots I'm seeing look like this:
http://dl.dropbox.com/u/419878/readOccurences.jpg
There is a sharp transition, but no portion to the left. I interpret
this
as a good sign: most of the reads are seen a small number of times
(<10),
and there are relatively few over-represented reads. Is there anything
there that would worry more experienced heads?
--
Alex Gutteridge
------------------------------
Message: 9
Date: Thu, 22 Jul 2010 14:25:54 +0000
From: "Bazeley, Peter" <peter.bazeley@rockets.utoledo.edu>
To: Elmer Fern?ndez <elmerfer at="" gmail.com="">
Cc: Sean Davis <sdavis2 at="" mail.nih.gov="">, Bioconductor mailing
list
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<5C621FDF7E426B4AAE3B2364B7EF07371F654407 at
BL2PRD0103MB050.prod.exchangelabs.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi Elmer,
The default scale option in heatmap.2 scales by row, whereas the
scale() function scales by column, so this is probably why there is a
difference. I think whichever dimension contains unique samples is how
you want to scale (if this was expression data, for example).
Pete
________________________________________
From: bioconductor-bounces@stat.math.ethz.ch [bioconductor-
bounces@stat.math.ethz.ch] on behalf of Sean Davis
[sdavis2@mail.nih.gov]
Sent: Thursday, July 22, 2010 9:17 AM
To: Elmer Fern?ndez
Cc: Bioconductor mailing list
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling outside!!!
2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">
> Dear Users
> I'm working with the heatmap.2 function and I realize that if you
use the
> scale input paramenter gives different results than usign the scale
> function
> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
attached
> the results of the two approaches and the used data matrix (M.csv).
> SO, what I'm doing wrong?
>
>
Hi, Elmer.
The default distance function used by heatmap.2 is dist() which is not
going
to be invariant under centering and scaling, I don't think. It looks
like
you are using that default.
Sean
> R Code
>
> library(gplots)
> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
outside")
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets
methods
> base
>
> other attached packages:
> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
> gtools_2.6.1 rkward_0.5.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.10.0
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 10
Date: Thu, 22 Jul 2010 16:38:16 +0200
From: Benjamin Otto <b.otto@uke.uni-hamburg.de>
To: "Bazeley, Peter" <peter.bazeley at="" rockets.utoledo.edu="">
Cc: Sean Davis <sdavis2 at="" mail.nih.gov="">, Bioconductor mailing
list
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID: <61679366-2C04-4959-8D3D-997A45BF45F5 at uke.uni-
hamburg.de>
Content-Type: text/plain; charset="utf-8"
Hi Guys,
do note that the scale() function in heatmap doesn't scale your values
till AFTER clustering for visualization purpose! So if you provide
already scaled data, you naturally will expect a different result.
cheers
Benjamin
Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
> Hi Elmer,
>
> The default scale option in heatmap.2 scales by row, whereas the
scale() function scales by column, so this is probably why there is a
difference. I think whichever dimension contains unique samples is how
you want to scale (if this was expression data, for example).
>
>
> Pete
> ________________________________________
> From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-
bounces at stat.math.ethz.ch] on behalf of Sean Davis [sdavis2 at
mail.nih.gov]
> Sent: Thursday, July 22, 2010 9:17 AM
> To: Elmer Fern?ndez
> Cc: Bioconductor mailing list
> Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling outside!!!
>
> 2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">
>
>> Dear Users
>> I'm working with the heatmap.2 function and I realize that if you
use the
>> scale input paramenter gives different results than usign the scale
>> function
>> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
attached
>> the results of the two approaches and the used data matrix (M.csv).
>> SO, what I'm doing wrong?
>>
>>
> Hi, Elmer.
>
> The default distance function used by heatmap.2 is dist() which is
not going
> to be invariant under centering and scaling, I don't think. It
looks like
> you are using that default.
>
> Sean
>
>
>> R Code
>>
>> library(gplots)
>> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
>> heatmap.2(M,scale="column",trace="none",main="scaled inside")
>> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
outside")
>>
>>> sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
>> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
>>
>> attached base packages:
>> [1] grid stats graphics grDevices utils datasets
methods
>> base
>>
>> other attached packages:
>> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
>> gtools_2.6.1 rkward_0.5.1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.0
>>
>>
>> --
>> Elmer A. Fern?ndez (Bioing. PhD)
>> Investigador Asistente CONICET - Research Assistant CONICET
>> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence
@ UCC
>> tel: +54-(0)351-4938000 int 145
>> Fax: +54-(0)351-4938081
>> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>>
http://sites.google.com/site/biologicaldatamininggroup/Home/
>> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>>
>>
>>
>> --
>> Elmer A. Fern?ndez (Bioing. PhD)
>> Investigador Asistente CONICET - Research Assistant CONICET
>> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence
@ UCC
>> tel: +54-(0)351-4938000 int 145
>> Fax: +54-(0)351-4938081
>> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>>
http://sites.google.com/site/biologicaldatamininggroup/Home/
>> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>>
>> [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
___________________________________________
Benjamin Otto, PhD
University Medical Center Hamburg-Eppendorf
Institute For Clinical Chemistry / Central Laboratories
Campus Forschung N27
Martinistr. 52,
D-20246 Hamburg
Tel.: +49 40 7410 51908
Fax.: +49 40 7410 54971
___________________________________________
--
Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
Genossenschaftsregister sowie das Unternehmensregister (EHUG):
Universit?tsklinikum Hamburg-Eppendorf
K?rperschaft des ?ffentlichen Rechts
Gerichtsstand: Hamburg
Vorstandsmitglieder:
Prof. Dr. J?rg F. Debatin (Vorsitzender)
Dr. Alexander Kirstein
Joachim Pr?l?
Prof. Dr. Dr. Uwe Koch-Gromus
------------------------------
Message: 11
Date: Thu, 22 Jul 2010 10:54:28 -0500
From: Erik Wright <eswright@wisc.edu>
To: BioC list <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] Biostrings - vcountPattern optimization
Message-ID: <3E19C211-BA75-4C68-88DE-1079FE64CAB0 at wisc.edu>
Content-Type: text/plain; CHARSET=US-ASCII
Hello,
Lately I have been working on counting sequence fragments in larger
sets of sequences. I am searching for thousands of fragments of 30 to
130 bases in hundreds of thousands of sequences between 1200 and 1600
bases. Currently I am using the following method to count the number
of "hits":
#### start ####
library(Biostrings)
fragments <- DNAStringSet(c("ACTG","AAAA"))
sequence_set <- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
for (i in 1:length(fragments)) {
counts <- vcountPattern(fragments[[i]],
sequence_set,
max.mismatch=1)
hits <- length(which(counts > 0))
print(hits)
}
#### end ####
This method is taking a long time to complete, so I am wondering if I
am doing this in the most efficient manner? Does anyone have a
suggestion for how I can accomplish the same task more efficiently?
Thanks!,
Erik
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.16.0 IRanges_1.6.0
loaded via a namespace (and not attached):
[1] Biobase_2.8.0
------------------------------
Message: 12
Date: Thu, 22 Jul 2010 12:19:21 -0400
From: Steve Lianoglou <mailinglist.honeypot@gmail.com>
To: Erik Wright <eswright at="" wisc.edu="">
Cc: BioC list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID:
<aanlktil5przsipsdxng8fszyvci5rcdqn5zgahy8rswa at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Hi,
On Thu, Jul 22, 2010 at 11:54 AM, Erik Wright <eswright at="" wisc.edu="">
wrote:
> Hello,
>
> Lately I have been working on counting sequence fragments in larger
sets of sequences. ?I am searching for thousands of fragments of 30 to
130 bases in hundreds of thousands of sequences between 1200 and 1600
bases. ?Currently I am using the following method to count the number
of "hits":
Would using bowtie as an intermediary be an option?
For instance, you could consider:
(i) making a bowtie-index out of your 1200-1600 bp "references"
(ii) aligning your 30-130bp fragments agains it and output to SAM
format (give each read a unique id so you can hunt for it later)
(iii) convert SAM -> indexed BAM
(iv) process bam file w/ Rsamtools -- perhaps you could simply do a
`table()` on the sequence IDs of each alignment if all you want is a
count -- of course now that the sequences are aligned, the data is in
"good shape" to do other types of analyses as well (whatever it is
that you're doing).
> #### start ####
> library(Biostrings)
> fragments <- DNAStringSet(c("ACTG","AAAA"))
> sequence_set <- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>
> for (i in 1:length(fragments)) {
> ? ? ? ?counts <- vcountPattern(fragments[[i]],
> ? ? ? ? ? ? ? ?sequence_set,
> ? ? ? ? ? ? ? ?max.mismatch=1)
> ? ? ? ?hits <- length(which(counts > 0))
> ? ? ? ?print(hits)
> }
> #### end ####
>
> This method is taking a long time to complete, so I am wondering if
I am doing this in the most efficient manner? ?Does anyone have a
suggestion for how I can accomplish the same task more efficiently?
I don't really have any suggestions to make the above R code run
faster ... sorry.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info:
http://cbio.mskcc.org/~lianos/contact
------------------------------
Message: 13
Date: Thu, 22 Jul 2010 17:11:26 +0800
From: Gina Liao <yi713@hotmail.com>
To: <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] problem about hgu133plus2 annotation
Message-ID: <bay146-w70ae25532ad94d7bd9116eaa20 at="" phx.gbl="">
Content-Type: text/plain
Dear All,
I have 20 chips, and I used R to standardize the CEL files.Then, i got
an expression value data of all chips.And I also downloaded the
annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV
format, Release 30 (22 MB, 11/15/09))
Here's my code.
########test = justRMA()eset.st = standardise(test)
exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675
20########
However, i found out that the order of the rownames(e.out) is a little
different to the row name of hgu133plus2.csv. The order from 54630 to
54640 is not the same to these two rows.
They should be the same,right? Is "hgu133plus2cdf" the problem? How
could I solve it?
Thanks!!!!!
Best,Gina
_________________________________________________________________
[[alternative HTML version deleted]]
------------------------------
Message: 14
Date: Thu, 22 Jul 2010 13:34:28 -0300
From: Elmer Fern?ndez <elmerfer@gmail.com>
To: Benjamin Otto <b.otto at="" uke.uni-hamburg.de="">
Cc: Sean Davis <sdavis2 at="" mail.nih.gov="">, Bioconductor mailing
list
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<aanlktindagcqq5capzkk6lteypeu4kr0bymue9sju_jp at="" mail.gmail.com="">
Content-Type: text/plain
Hy Benjamin
Are you sure about that? If so, I think that it is not correct, right?
best
Elmer
2010/7/22 Benjamin Otto <b.otto at="" uke.uni-hamburg.de="">
> Hi Guys,
>
> do note that the scale() function in heatmap doesn't scale your
values till
> AFTER clustering for visualization purpose! So if you provide
already scaled
> data, you naturally will expect a different result.
>
> cheers
>
> Benjamin
>
> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
>
> > Hi Elmer,
> >
> > The default scale option in heatmap.2 scales by row, whereas the
scale()
> function scales by column, so this is probably why there is a
difference. I
> think whichever dimension contains unique samples is how you want to
scale
> (if this was expression data, for example).
> >
> >
> > Pete
> > ________________________________________
> > From: bioconductor-bounces at stat.math.ethz.ch [
> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis [
> sdavis2 at mail.nih.gov]
> > Sent: Thursday, July 22, 2010 9:17 AM
> > To: Elmer Fern?ndez
> > Cc: Bioconductor mailing list
> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function
> gives different results than scaling outside!!!
> >
> > 2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">
> >
> >> Dear Users
> >> I'm working with the heatmap.2 function and I realize that if you
use
> the
> >> scale input paramenter gives different results than usign the
scale
> >> function
> >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
> attached
> >> the results of the two approaches and the used data matrix
(M.csv).
> >> SO, what I'm doing wrong?
> >>
> >>
> > Hi, Elmer.
> >
> > The default distance function used by heatmap.2 is dist() which is
not
> going
> > to be invariant under centering and scaling, I don't think. It
looks
> like
> > you are using that default.
> >
> > Sean
> >
> >
> >> R Code
> >>
> >> library(gplots)
> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> outside")
> >>
> >>> sessionInfo()
> >> R version 2.10.0 (2009-10-26)
> >> x86_64-unknown-linux-gnu
> >>
> >> locale:
> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> >> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
> >> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
> >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
> >>
> >> attached base packages:
> >> [1] grid stats graphics grDevices utils datasets
methods
> >> base
> >>
> >> other attached packages:
> >> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
> >> gtools_2.6.1 rkward_0.5.1
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.10.0
> >>
> >>
> >> --
> >> Elmer A. Fern?ndez (Bioing. PhD)
> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @ UCC
> >> tel: +54-(0)351-4938000 int 145
> >> Fax: +54-(0)351-4938081
> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
> >>
> >>
> >>
> >> --
> >> Elmer A. Fern?ndez (Bioing. PhD)
> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @ UCC
> >> tel: +54-(0)351-4938000 int 145
> >> Fax: +54-(0)351-4938081
> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
> >>
> >> [[alternative HTML version deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
> ___________________________________________
> Benjamin Otto, PhD
> University Medical Center Hamburg-Eppendorf
> Institute For Clinical Chemistry / Central Laboratories
> Campus Forschung N27
> Martinistr. 52,
> D-20246 Hamburg
>
> Tel.: +49 40 7410 51908
> Fax.: +49 40 7410 54971
> ___________________________________________
>
>
>
>
>
> --
> Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
>
> Universit?tsklinikum Hamburg-Eppendorf
> K?rperschaft des ?ffentlichen Rechts
> Gerichtsstand: Hamburg
>
> Vorstandsmitglieder:
> Prof. Dr. J?rg F. Debatin (Vorsitzender)
> Dr. Alexander Kirstein
> Joachim Pr?l?
> Prof. Dr. Dr. Uwe Koch-Gromus
>
--
Elmer A. Fern?ndez (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
[[alternative HTML version deleted]]
------------------------------
Message: 15
Date: Thu, 22 Jul 2010 09:38:19 -0700
From: Marc Carlson <mcarlson@fhcrc.org>
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] problem about hgu133plus2 annotation
Message-ID: <4C4873FB.5030207 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1
Hi Gina,
I am afraid it's a little hard to tell what is going on here. For
example, I don't see sessionInfo() so it is hard to tell what you were
running. And I only have enough code to wildly speculate about what
you
were doing. You might want to see our posting guide here:
http://www.bioconductor.org/docs/postingGuide.html
Marc
On 07/22/2010 02:11 AM, Gina Liao wrote:
> Dear All,
> I have 20 chips, and I used R to standardize the CEL files.Then, i
got an expression value data of all chips.And I also downloaded the
annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV
format, Release 30 (22 MB, 11/15/09))
> Here's my code.
> ########test = justRMA()eset.st = standardise(test)
> exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675
20########
> However, i found out that the order of the rownames(e.out) is a
little different to the row name of hgu133plus2.csv. The order from
54630 to 54640 is not the same to these two rows.
> They should be the same,right? Is "hgu133plus2cdf" the problem? How
could I solve it?
> Thanks!!!!!
> Best,Gina
> _________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 16
Date: Thu, 22 Jul 2010 12:41:42 -0400
From: "James W. MacDonald" <jmacdon@med.umich.edu>
To: Gina Liao <yi713 at="" hotmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] problem about hgu133plus2 annotation
Message-ID: <4C4874C6.9090008 at med.umich.edu>
Content-Type: text/plain; charset="iso-8859-1"; format="flowed"
Hi Gina,
On 7/22/2010 5:11 AM, Gina Liao wrote:
>
> Dear All,
> I have 20 chips, and I used R to standardize the CEL files.Then, i
got an expression value data of all chips.And I also downloaded the
annotation csv format from NetAffy.(HG-U133_Plus_2 Annotations, CSV
format, Release 30 (22 MB, 11/15/09))
> Here's my code.
> ########test = justRMA()eset.st = standardise(test)
> exprs.st = exprseset.st)e.out = exprs.stdim(e.out) #* 54675
20########
> However, i found out that the order of the rownames(e.out) is a
little different to the row name of hgu133plus2.csv. The order from
54630 to 54640 is not the same to these two rows.
> They should be the same,right? Is "hgu133plus2cdf" the problem? How
could I solve it?
I would recommend you use the annotation packages that are available
from Bioconductor rather than downloading the annotation packages from
Affymetrix. The BioC annotation packages contain the same information,
but are designed to be easily used from within R, and you will find
the
.csv files you can get from Affy are not as user-friendly.
You can get the annotation package using biocLite():
biocLite("hgu133plus2.db")
Note that there is no reason to expect that the order of annotation
data
will be the same as the order of expression data. Re-ordering things
is
exceedingly simple in R, so this point is irrelevant.
Using the annotation packages will take some reading on your part, but
once you get the hang of things, I think you will like how they work.
You might start with
library(hgu133plus2.db)
?hgu133plus2.db
as well as
openVignette() and choose the AnnotationDbi vignette.
If you are interested in annotating the set of interesting genes from
your experiment, you will want to look at the annaffy package, which
will allow you to output both HTML and text files with your results
and
annotations for each gene.
In addition, you might want to look at the affycoretools package,
which
helps automate some of the steps required to annotate results. This
package is also integrated with limma, so you can go straight from
your
linear model fits to output in one function call.
Best,
Jim
> Thanks!!!!!
> Best,Gina
> _________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
------------------------------
Message: 17
Date: Thu, 22 Jul 2010 10:11:28 -0700
From: Patrick Aboyoun <paboyoun@fhcrc.org>
To: Erik Wright <eswright at="" wisc.edu="">
Cc: BioC list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <4C487BC0.6010309 at fhcrc.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Erik,
Have you tried vcountPDict? It will use an Aho - Corasick matching
algorithm
(
http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm)
that is pretty fast, albeit memory intensive.
library(Biostrings)
fragments<- DNAStringSet(c("ACTG","AAAA"))
sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
pdict<- PDict(fragments)
counts<- vcountPDict(pdict, sequence_set)
> counts
[,1] [,2]
[1,] 0 0
[2,] 0 0
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.17.26 IRanges_1.7.13
loaded via a namespace (and not attached):
[1] Biobase_2.9.0 tools_2.12.0
Patrick
On 7/22/10 8:54 AM, Erik Wright wrote:
> Hello,
>
> Lately I have been working on counting sequence fragments in larger
sets of sequences. I am searching for thousands of fragments of 30 to
130 bases in hundreds of thousands of sequences between 1200 and 1600
bases. Currently I am using the following method to count the number
of "hits":
>
> #### start ####
> library(Biostrings)
> fragments<- DNAStringSet(c("ACTG","AAAA"))
> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>
> for (i in 1:length(fragments)) {
> counts<- vcountPattern(fragments[[i]],
> sequence_set,
> max.mismatch=1)
> hits<- length(which(counts> 0))
> print(hits)
> }
> #### end ####
>
> This method is taking a long time to complete, so I am wondering if
I am doing this in the most efficient manner? Does anyone have a
suggestion for how I can accomplish the same task more efficiently?
>
> Thanks!,
> Erik
>
>
>
>
>
>> sessionInfo()
>>
> R version 2.11.0 (2010-04-22)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.16.0 IRanges_1.6.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
------------------------------
Message: 18
Date: Thu, 22 Jul 2010 10:26:48 -0700
From: Patrick Aboyoun <paboyoun@fhcrc.org>
To: "Coghlan, Avril" <a.coghlan at="" ucc.ie="">
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] feature request - pairwiseAlignment() in
Biostrings
Message-ID: <4C487F58.1060305 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Avril,
I wont have time to extend pairwiseAlignment, but you are more then
welcome to. It is written mainly in C with an R wrapper. You can grab
it
via svn at the URL
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings
with username: readonly and password: readonly.
The particular files you'll want to look at are
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings
/src/align_pairwiseAlignment.c
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings
/R/pairwiseAlignment.R
I can provide you with a code walkthrough if you like. Since I
optimized
the code for speed and memory usage, you may find it is easier to
write
your own C level function that will be used instead of the code I have
since I don't keep enough information around to be able to select the
top X alignments.
Cheers,
Patrick
On 7/22/10 1:54 AM, Coghlan, Avril wrote:
> Dear Patrick and Steve,
>
> I am wondering whether it would be possible to add an option to the
> pairwiseAlignment() function in Biostrings, so that it could print
out:
> (i) all the top-scoring alignments for 2 sequences, if there are
more
> than one equally scoring top-scoring alignments ?
> (ii) the top X top-scoring alignments for 2 sequences, where the
user
> specifies the number X, and where the X alignments don't have to
have
> equal scores, but are ordered by decreasing score ?
>
> I'm not sure if these options are easy to add, but would be very
useful
> if you could add them.
>
> If you haven't time to do this, I would be willing to try to help
add
> the features to the pairwiseAlignment() function, if you can point
me
> towards the code.
>
> Kind Regards,
> Avril
>
> Avril Coghlan
> University College Cork
> Ireland
>
>
>
>
>
------------------------------
Message: 19
Date: Thu, 22 Jul 2010 12:32:39 -0500
From: Erik Wright <eswright@wisc.edu>
To: Patrick Aboyoun <paboyoun at="" fhcrc.org="">
Cc: BioC list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <fbde47f7-a49a-4d50-93bb-0ae8d9097da7 at="" wisc.edu="">
Content-Type: text/plain; charset=windows-1252
Hi Patrick,
Thanks, this looks promising. Three possible complications are:
(1) The fragments are not all the same width. Is this possible with
Pdict?
(2) I allow a variable number of mismatches based on each individual
fragment's width.
(3) The fragments sometimes include ambiguity letters (IUPAC extended
letters).
A more accurate example would be:
#### start ####
fragments <- DNAStringSet(c("ACS","NCCAGAA")) # no indels
sequence_set <-
DNAStringSet(c("ATAGCATACKACCA","GATTACGTACCADADATTACA") # variable
widths
for (i in 1:length(fragments)) {
counts <- vcountPattern(fragments[[i]],
sequence_set,
max.mismatch=floor(length(fragments[[i]])/5)) #
variable mis-matches
hits <- length(which(counts > 0))
print(hits)
}
#### end ####
Do think it is possible to make this work Pdict for a speed
improvement?
Thanks again!,
Erik
On Jul 22, 2010, at 12:11 PM, Patrick Aboyoun wrote:
> Erik,
> Have you tried vcountPDict? It will use an Aho - Corasick matching
algorithm
(
http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm)
that is pretty fast, albeit memory intensive.
>
> library(Biostrings)
> fragments<- DNAStringSet(c("ACTG","AAAA"))
> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
> pdict<- PDict(fragments)
> counts<- vcountPDict(pdict, sequence_set)
>
>> counts
> [,1] [,2]
> [1,] 0 0
> [2,] 0 0
>
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.17.26 IRanges_1.7.13
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.9.0 tools_2.12.0
>
>
>
>
> Patrick
>
>
> On 7/22/10 8:54 AM, Erik Wright wrote:
>> Hello,
>>
>> Lately I have been working on counting sequence fragments in larger
sets of sequences. I am searching for thousands of fragments of 30 to
130 bases in hundreds of thousands of sequences between 1200 and 1600
bases. Currently I am using the following method to count the number
of "hits":
>>
>> #### start ####
>> library(Biostrings)
>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>>
>> for (i in 1:length(fragments)) {
>> counts<- vcountPattern(fragments[[i]],
>> sequence_set,
>> max.mismatch=1)
>> hits<- length(which(counts> 0))
>> print(hits)
>> }
>> #### end ####
>>
>> This method is taking a long time to complete, so I am wondering if
I am doing this in the most efficient manner? Does anyone have a
suggestion for how I can accomplish the same task more efficiently?
>>
>> Thanks!,
>> Erik
>>
>>
>>
>>
>>
>>> sessionInfo()
>>>
>> R version 2.11.0 (2010-04-22)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] Biostrings_2.16.0 IRanges_1.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.8.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
------------------------------
Message: 20
Date: Thu, 22 Jul 2010 11:10:03 -0700
From: Michael Lawrence <lawrence.michael@gene.com>
To: Patrick Aboyoun <paboyoun at="" fhcrc.org="">
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] feature request - pairwiseAlignment() in
Biostrings
Message-ID:
<aanlktik92of_5a3jhph8p_bwalpmt9yq2brvscd7b2lz at="" mail.gmail.com="">
Content-Type: text/plain
Toughest question is probably not how to modify the C code, but how
the
results will be represented and manipulated in R.
Good luck
On Thu, Jul 22, 2010 at 10:26 AM, Patrick Aboyoun <paboyoun at="" fhcrc.org="">wrote:
> Avril,
> I wont have time to extend pairwiseAlignment, but you are more then
welcome
> to. It is written mainly in C with an R wrapper. You can grab it via
svn at
> the URL
>
>
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrings
>
> with username: readonly and password: readonly.
>
> The particular files you'll want to look at are
>
>
>
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrin
gs/src/align_pairwiseAlignment.c
>
>
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Biostrin
gs/R/pairwiseAlignment.R
>
> I can provide you with a code walkthrough if you like. Since I
optimized
> the code for speed and memory usage, you may find it is easier to
write your
> own C level function that will be used instead of the code I have
since I
> don't keep enough information around to be able to select the top X
> alignments.
>
>
> Cheers,
>
> Patrick
>
>
>
> On 7/22/10 1:54 AM, Coghlan, Avril wrote:
>
>> Dear Patrick and Steve,
>>
>> I am wondering whether it would be possible to add an option to the
>> pairwiseAlignment() function in Biostrings, so that it could print
out:
>> (i) all the top-scoring alignments for 2 sequences, if there are
more
>> than one equally scoring top-scoring alignments ?
>> (ii) the top X top-scoring alignments for 2 sequences, where the
user
>> specifies the number X, and where the X alignments don't have to
have
>> equal scores, but are ordered by decreasing score ?
>>
>> I'm not sure if these options are easy to add, but would be very
useful
>> if you could add them.
>>
>> If you haven't time to do this, I would be willing to try to help
add
>> the features to the pairwiseAlignment() function, if you can point
me
>> towards the code.
>>
>> Kind Regards,
>> Avril
>>
>> Avril Coghlan
>> University College Cork
>> Ireland
>>
>>
>>
>>
>>
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 21
Date: Thu, 22 Jul 2010 16:04:06 -0400
From: Steve Lianoglou <mailinglist.honeypot@gmail.com>
To: Elmer Fern?ndez <elmerfer at="" gmail.com="">
Cc: Sean Davis <sdavis2 at="" mail.nih.gov="">, Bioconductor mailing
list
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<aanlktikape0f5juvye1tioilcfi_inwnha6vmn5drhok at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
Hi,
2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">:
> Hy Benjamin
> Are you sure about that?
Looking at the source code for heatmap.2 (and heatmap, for that
matter) it looks as if Benjamin is correct. The scaling is done after
the clustering.
> If so, I think that it is not correct, right?
I guess it depends on what you were expecting it to do :-)
Having just realized this myself (yikes -- see what happens when we
assume(?)), I think I'd more often rather send in a scaled version of
the data and have scale='none' in the heatmap call, to be honest.
-steve
> best
> Elmer
>
> 2010/7/22 Benjamin Otto <b.otto at="" uke.uni-hamburg.de="">
>
>> Hi Guys,
>>
>> do note that the scale() function in heatmap doesn't scale your
values till
>> AFTER clustering for visualization purpose! So if you provide
already scaled
>> data, you naturally will expect a different result.
>>
>> cheers
>>
>> Benjamin
>>
>> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
>>
>> > Hi Elmer,
>> >
>> > The default scale option in heatmap.2 scales by row, whereas the
scale()
>> function scales by column, so this is probably why there is a
difference. I
>> think whichever dimension contains unique samples is how you want
to scale
>> (if this was expression data, for example).
>> >
>> >
>> > Pete
>> > ________________________________________
>> > From: bioconductor-bounces at stat.math.ethz.ch [
>> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis
[
>> sdavis2 at mail.nih.gov]
>> > Sent: Thursday, July 22, 2010 9:17 AM
>> > To: Elmer Fern?ndez
>> > Cc: Bioconductor mailing list
>> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function
>> ? ? ? gives different results than scaling outside!!!
>> >
>> > 2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">
>> >
>> >> Dear Users
>> >> I'm working with the heatmap.2 function and I realize that if
you use
>> the
>> >> scale input paramenter gives different results than usign the
scale
>> >> function
>> >> outsie and feed the heatmap.2 fucntion with the scaled matrix. I
>> attached
>> >> the results of the two approaches and the used data matrix
(M.csv).
>> >> SO, what I'm doing wrong?
>> >>
>> >>
>> > Hi, Elmer.
>> >
>> > The default distance function used by heatmap.2 is dist() which
is not
>> going
>> > to be invariant under centering and scaling, I don't think. ?It
looks
>> like
>> > you are using that default.
>> >
>> > Sean
>> >
>> >
>> >> R Code
>> >>
>> >> library(gplots)
>> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
>> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
>> >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
>> outside")
>> >>
>> >>> sessionInfo()
>> >> R version 2.10.0 (2009-10-26)
>> >> x86_64-unknown-linux-gnu
>> >>
>> >> locale:
>> >> [1] LC_CTYPE=en_US.UTF-8 ? ? ? ? ?LC_NUMERIC=C
>> >> LC_TIME=en_US.UTF-8 ? ? ? ? ? LC_COLLATE=en_US.UTF-8
>> >> [5] LC_MONETARY=en_US.UTF-8 ? ? ? LC_MESSAGES=en_US.UTF-8
>> >> LC_PAPER=en_US.UTF-8 ? ? ? ? ?LC_NAME=en_US.UTF-8
>> >> [9] LC_ADDRESS=en_US.UTF-8 ? ? ? ?LC_TELEPHONE=en_US.UTF-8
>> >> LC_MEASUREMENT=en_US.UTF-8 ? ?LC_IDENTIFICATION=en_US.UTF-8
>> >>
>> >> attached base packages:
>> >> [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets
?methods
>> >> base
>> >>
>> >> other attached packages:
>> >> [1] gplots_2.7.4 ? caTools_1.10 ? bitops_1.0-4.1 gdata_2.7.1
>> >> gtools_2.6.1 ? rkward_0.5.1
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] tools_2.10.0
>> >>
>> >>
>> >> --
>> >> Elmer A. Fern?ndez (Bioing. PhD)
>> >> Investigador Asistente CONICET - Research Assistant CONICET
>> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @ UCC
>> >> tel: +54-(0)351-4938000 int 145
>> >> Fax: +54-(0)351-4938081
>> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
>> >> mail address: Camino Alta Gracia Km 7.1/2-
C?rdoba-5017-Argentina
>> >>
>> >>
>> >>
>> >> --
>> >> Elmer A. Fern?ndez (Bioing. PhD)
>> >> Investigador Asistente CONICET - Research Assistant CONICET
>> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @ UCC
>> >> tel: +54-(0)351-4938000 int 145
>> >> Fax: +54-(0)351-4938081
>> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
>> >> mail address: Camino Alta Gracia Km 7.1/2-
C?rdoba-5017-Argentina
>> >>
>> >> ? ? ? [[alternative HTML version deleted]]
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>
>> >
>> > ? ? ? ?[[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>>
>> ___________________________________________
>> Benjamin Otto, PhD
>> University Medical Center Hamburg-Eppendorf
>> Institute For Clinical Chemistry / Central Laboratories
>> Campus Forschung N27
>> Martinistr. 52,
>> D-20246 Hamburg
>>
>> Tel.: +49 40 7410 51908
>> Fax.: +49 40 7410 54971
>> ___________________________________________
>>
>>
>>
>>
>>
>> --
>> Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und
>> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
>>
>> Universit?tsklinikum Hamburg-Eppendorf
>> K?rperschaft des ?ffentlichen Rechts
>> Gerichtsstand: Hamburg
>>
>> Vorstandsmitglieder:
>> Prof. Dr. J?rg F. Debatin (Vorsitzender)
>> Dr. Alexander Kirstein
>> Joachim Pr?l?
>> Prof. Dr. Dr. Uwe Koch-Gromus
>>
>
>
>
> --
> Elmer A. Fern?ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info:
http://cbio.mskcc.org/~lianos/contact
------------------------------
Message: 22
Date: Thu, 22 Jul 2010 13:14:34 -0700
From: Hervé Pagès <hpages@fhcrc.org>
To: Erik Wright <eswright at="" wisc.edu="">
Cc: BioC list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Biostrings - vcountPattern optimization
Message-ID: <4C48A6AA.2050407 at fhcrc.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Hi Erik,
On 07/22/2010 10:32 AM, Erik Wright wrote:
> Hi Patrick,
>
> Thanks, this looks promising. Three possible complications are:
> (1) The fragments are not all the same width. Is this possible
with Pdict?
Yes, but given requirement (2), you need another solution.
> (2) I allow a variable number of mismatches based on each
individual fragment's width.
So given (1) and (2), you could group your fragments by equal length,
make a PDict object for each group, and use a single number of
mismatches for that group (seems like this number only depends on
the length of the fragment).
> (3) The fragments sometimes include ambiguity letters (IUPAC
extended letters).
Unfortunately ambiguities are supported only in the subject at the
moment. But you could still treat them separately with vcountPattern()
in a loop.
>
> A more accurate example would be:
>
> #### start ####
> fragments<- DNAStringSet(c("ACS","NCCAGAA")) # no indels
> sequence_set<-
DNAStringSet(c("ATAGCATACKACCA","GATTACGTACCADADATTACA") # variable
widths
> for (i in 1:length(fragments)) {
> counts<- vcountPattern(fragments[[i]],
> sequence_set,
> max.mismatch=floor(length(fragments[[i]])/5)) #
variable mis-matches
> hits<- length(which(counts> 0))
> print(hits)
> }
> #### end ####
>
> Do think it is possible to make this work Pdict for a speed
improvement?
With max.mismatch being a fifth of the fragment length that means it
will be between 6 (for 30bp fragments) and 26 (for 130bp fragments).
Unfortunately, that's way too many mismatches PDict()/vcountPDict()
can handle.
Cheers,
H.
>
> Thanks again!,
> Erik
>
>
>
> On Jul 22, 2010, at 12:11 PM, Patrick Aboyoun wrote:
>
>> Erik,
>> Have you tried vcountPDict? It will use an Aho - Corasick matching
algorithm
(
http://en.wikipedia.org/wiki/Aho?Corasick_string_matching_algorithm)
that is pretty fast, albeit memory intensive.
>>
>> library(Biostrings)
>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>> pdict<- PDict(fragments)
>> counts<- vcountPDict(pdict, sequence_set)
>>
>>> counts
>> [,1] [,2]
>> [1,] 0 0
>> [2,] 0 0
>>
>>> sessionInfo()
>> R version 2.12.0 Under development (unstable) (2010-07-18 r52554)
>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods
base
>>
>> other attached packages:
>> [1] Biostrings_2.17.26 IRanges_1.7.13
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.9.0 tools_2.12.0
>>
>>
>>
>>
>> Patrick
>>
>>
>> On 7/22/10 8:54 AM, Erik Wright wrote:
>>> Hello,
>>>
>>> Lately I have been working on counting sequence fragments in
larger sets of sequences. I am searching for thousands of fragments
of 30 to 130 bases in hundreds of thousands of sequences between 1200
and 1600 bases. Currently I am using the following method to count
the number of "hits":
>>>
>>> #### start ####
>>> library(Biostrings)
>>> fragments<- DNAStringSet(c("ACTG","AAAA"))
>>> sequence_set<- DNAStringSet(c("TAGACATGAC","ACCAAATGAC"))
>>>
>>> for (i in 1:length(fragments)) {
>>> counts<- vcountPattern(fragments[[i]],
>>> sequence_set,
>>> max.mismatch=1)
>>> hits<- length(which(counts> 0))
>>> print(hits)
>>> }
>>> #### end ####
>>>
>>> This method is taking a long time to complete, so I am wondering
if I am doing this in the most efficient manner? Does anyone have a
suggestion for how I can accomplish the same task more efficiently?
>>>
>>> Thanks!,
>>> Erik
>>>
>>>
>>>
>>>
>>>
>>>> sessionInfo()
>>>>
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
base
>>>
>>> other attached packages:
>>> [1] Biostrings_2.16.0 IRanges_1.6.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
------------------------------
Message: 23
Date: Thu, 22 Jul 2010 17:14:42 -0300
From: Elmer Fern?ndez <elmerfer@gmail.com>
To: Steve Lianoglou <mailinglist.honeypot at="" gmail.com="">
Cc: Sean Davis <sdavis2 at="" mail.nih.gov="">, Bioconductor mailing
list
<bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<aanlktiklby_bjnudymd7aakezc0yl3wscagrqs7kjnt8 at="" mail.gmail.com="">
Content-Type: text/plain
Dear Steve
You are right when you say that you should scale your data according
to what
do you want to do, but from the help it is not clear when the scaling
is
done. In most of the R functions, when the scale parameter is present
in the
input you assume that the scaling process is permormed BEFORE the main
process. That's why I said that it could not be correct.
Dear guys, THANKS for the discussion!! I'll really appreciated and
enjoyed.
Best
Elmer
2010/7/22 Steve Lianoglou <mailinglist.honeypot at="" gmail.com="">
> Hi,
>
> 2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">:
> > Hy Benjamin
> > Are you sure about that?
>
> Looking at the source code for heatmap.2 (and heatmap, for that
> matter) it looks as if Benjamin is correct. The scaling is done
after
> the clustering.
>
> > If so, I think that it is not correct, right?
>
> I guess it depends on what you were expecting it to do :-)
>
> Having just realized this myself (yikes -- see what happens when we
> assume(?)), I think I'd more often rather send in a scaled version
of
> the data and have scale='none' in the heatmap call, to be honest.
>
> -steve
>
> > best
> > Elmer
> >
> > 2010/7/22 Benjamin Otto <b.otto at="" uke.uni-hamburg.de="">
> >
> >> Hi Guys,
> >>
> >> do note that the scale() function in heatmap doesn't scale your
values
> till
> >> AFTER clustering for visualization purpose! So if you provide
already
> scaled
> >> data, you naturally will expect a different result.
> >>
> >> cheers
> >>
> >> Benjamin
> >>
> >> Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
> >>
> >> > Hi Elmer,
> >> >
> >> > The default scale option in heatmap.2 scales by row, whereas
the
> scale()
> >> function scales by column, so this is probably why there is a
> difference. I
> >> think whichever dimension contains unique samples is how you want
to
> scale
> >> (if this was expression data, for example).
> >> >
> >> >
> >> > Pete
> >> > ________________________________________
> >> > From: bioconductor-bounces at stat.math.ethz.ch [
> >> bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean
Davis [
> >> sdavis2 at mail.nih.gov]
> >> > Sent: Thursday, July 22, 2010 9:17 AM
> >> > To: Elmer Fern?ndez
> >> > Cc: Bioconductor mailing list
> >> > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside
the
> function
> >> gives different results than scaling outside!!!
> >> >
> >> > 2010/7/22 Elmer Fern?ndez <elmerfer at="" gmail.com="">
> >> >
> >> >> Dear Users
> >> >> I'm working with the heatmap.2 function and I realize that if
you use
> >> the
> >> >> scale input paramenter gives different results than usign the
scale
> >> >> function
> >> >> outsie and feed the heatmap.2 fucntion with the scaled matrix.
I
> >> attached
> >> >> the results of the two approaches and the used data matrix
(M.csv).
> >> >> SO, what I'm doing wrong?
> >> >>
> >> >>
> >> > Hi, Elmer.
> >> >
> >> > The default distance function used by heatmap.2 is dist() which
is not
> >> going
> >> > to be invariant under centering and scaling, I don't think. It
looks
> >> like
> >> > you are using that default.
> >> >
> >> > Sean
> >> >
> >> >
> >> >> R Code
> >> >>
> >> >> library(gplots)
> >> >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> >> >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> >> >>
x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> >> outside")
> >> >>
> >> >>> sessionInfo()
> >> >> R version 2.10.0 (2009-10-26)
> >> >> x86_64-unknown-linux-gnu
> >> >>
> >> >> locale:
> >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> >> >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> >> >> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
> >> >> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
> >> >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
> >> >>
> >> >> attached base packages:
> >> >> [1] grid stats graphics grDevices utils datasets
> methods
> >> >> base
> >> >>
> >> >> other attached packages:
> >> >> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
> >> >> gtools_2.6.1 rkward_0.5.1
> >> >>
> >> >> loaded via a namespace (and not attached):
> >> >> [1] tools_2.10.0
> >> >>
> >> >>
> >> >> --
> >> >> Elmer A. Fern?ndez (Bioing. PhD)
> >> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @
> UCC
> >> >> tel: +54-(0)351-4938000 int 145
> >> >> Fax: +54-(0)351-4938081
> >> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> >> mail address: Camino Alta Gracia Km 7.1/2-
C?rdoba-5017-Argentina
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Elmer A. Fern?ndez (Bioing. PhD)
> >> >> Investigador Asistente CONICET - Research Assistant CONICET
> >> >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @
> UCC
> >> >> tel: +54-(0)351-4938000 int 145
> >> >> Fax: +54-(0)351-4938081
> >> >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >> >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> >> >> mail address: Camino Alta Gracia Km 7.1/2-
C?rdoba-5017-Argentina
> >> >>
> >> >> [[alternative HTML version deleted]]
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioconductor mailing list
> >> >> Bioconductor at stat.math.ethz.ch
> >> >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> >> Search the archives:
> >> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at stat.math.ethz.ch
> >> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> >
> >>
> >> ___________________________________________
> >> Benjamin Otto, PhD
> >> University Medical Center Hamburg-Eppendorf
> >> Institute For Clinical Chemistry / Central Laboratories
> >> Campus Forschung N27
> >> Martinistr. 52,
> >> D-20246 Hamburg
> >>
> >> Tel.: +49 40 7410 51908
> >> Fax.: +49 40 7410 54971
> >> ___________________________________________
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister
und
> >> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
> >>
> >> Universit?tsklinikum Hamburg-Eppendorf
> >> K?rperschaft des ?ffentlichen Rechts
> >> Gerichtsstand: Hamburg
> >>
> >> Vorstandsmitglieder:
> >> Prof. Dr. J?rg F. Debatin (Vorsitzender)
> >> Dr. Alexander Kirstein
> >> Joachim Pr?l?
> >> Prof. Dr. Dr. Uwe Koch-Gromus
> >>
> >
> >
> >
> > --
> > Elmer A. Fern?ndez (Bioing. PhD)
> > Investigador Asistente CONICET - Research Assistant CONICET
> > Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence
@ UCC
> > tel: +54-(0)351-4938000 int 145
> > Fax: +54-(0)351-4938081
> > web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> >
http://sites.google.com/site/biologicaldatamininggroup/Home/
> > mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
> >
> > [[alternative HTML version deleted]]
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info:
http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact="">
>
--
Elmer A. Fern?ndez (Bioing. PhD)
Investigador Asistente CONICET - Research Assistant CONICET
Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
tel: +54-(0)351-4938000 int 145
Fax: +54-(0)351-4938081
web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
http://sites.google.com/site/biologicaldatamininggroup/Home/
mail address: Camino Alta Gracia Km 7.1/2- C?rdoba-5017-Argentina
[[alternative HTML version deleted]]
------------------------------
Message: 24
Date: Thu, 22 Jul 2010 15:00:56 -0600
From: Sean Davis <sdavis2@mail.nih.gov>
To: Elmer Fern?ndez <elmerfer at="" gmail.com="">
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
function gives different results than scaling
outside!!!
Message-ID:
<aanlktimx3yubpv2nsyjcrypaxr4zon5wfknzqbs5tr0i at="" mail.gmail.com="">
Content-Type: text/plain
2010/7/22 Elmer Fern??ndez <elmerfer at="" gmail.com="">
> Hy Benjamin
> Are you sure about that? If so, I think that it is not correct,
right?
> best
> Elmer
>
Hi, Elmer. My reading of the source code for heatmap.2 suggests that
Benjamin is correct.
Sean
>
> 2010/7/22 Benjamin Otto <b.otto at="" uke.uni-hamburg.de="">
>
> > Hi Guys,
> >
> > do note that the scale() function in heatmap doesn't scale your
values
> till
> > AFTER clustering for visualization purpose! So if you provide
already
> scaled
> > data, you naturally will expect a different result.
> >
> > cheers
> >
> > Benjamin
> >
> > Am 22.07.2010 um 16:25 schrieb Bazeley, Peter:
> >
> > > Hi Elmer,
> > >
> > > The default scale option in heatmap.2 scales by row, whereas the
> scale()
> > function scales by column, so this is probably why there is a
difference.
> I
> > think whichever dimension contains unique samples is how you want
to
> scale
> > (if this was expression data, for example).
> > >
> > >
> > > Pete
> > > ________________________________________
> > > From: bioconductor-bounces at stat.math.ethz.ch [
> > bioconductor-bounces at stat.math.ethz.ch] on behalf of Sean Davis
[
> > sdavis2 at mail.nih.gov]
> > > Sent: Thursday, July 22, 2010 9:17 AM
> > > To: Elmer Fern??ndez
> > > Cc: Bioconductor mailing list
> > > Subject: Re: [BioC] Heatmap.2 scale problems: Sacling inside the
> function
> > gives different results than scaling outside!!!
> > >
> > > 2010/7/22 Elmer Fern??ndez <elmerfer at="" gmail.com="">
> > >
> > >> Dear Users
> > >> I'm working with the heatmap.2 function and I realize that if
you use
> > the
> > >> scale input paramenter gives different results than usign the
scale
> > >> function
> > >> outsie and feed the heatmap.2 fucntion with the scaled matrix.
I
> > attached
> > >> the results of the two approaches and the used data matrix
(M.csv).
> > >> SO, what I'm doing wrong?
> > >>
> > >>
> > > Hi, Elmer.
> > >
> > > The default distance function used by heatmap.2 is dist() which
is not
> > going
> > > to be invariant under centering and scaling, I don't think. It
looks
> > like
> > > you are using that default.
> > >
> > > Sean
> > >
> > >
> > >> R Code
> > >>
> > >> library(gplots)
> > >> M=matrix(c(rnorm(10*3,1,2),rnorm(10*2,-0.5,1)),ncol=5)
> > >> heatmap.2(M,scale="column",trace="none",main="scaled inside")
> > >> x11();heatmap.2(scale(M),scale="none",trace="none",main="scaled
> > outside")
> > >>
> > >>> sessionInfo()
> > >> R version 2.10.0 (2009-10-26)
> > >> x86_64-unknown-linux-gnu
> > >>
> > >> locale:
> > >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> > >> LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> > >> LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
> > >> [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
> > >> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
> > >>
> > >> attached base packages:
> > >> [1] grid stats graphics grDevices utils datasets
> methods
> > >> base
> > >>
> > >> other attached packages:
> > >> [1] gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gdata_2.7.1
> > >> gtools_2.6.1 rkward_0.5.1
> > >>
> > >> loaded via a namespace (and not attached):
> > >> [1] tools_2.10.0
> > >>
> > >>
> > >> --
> > >> Elmer A. Fern??ndez (Bioing. PhD)
> > >> Investigador Asistente CONICET - Research Assistant CONICET
> > >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @
> UCC
> > >> tel: +54-(0)351-4938000 int 145
> > >> Fax: +54-(0)351-4938081
> > >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> > >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> > >> mail address: Camino Alta Gracia Km 7.1/2-
C??rdoba-5017-Argentina
> > >>
> > >>
> > >>
> > >> --
> > >> Elmer A. Fern??ndez (Bioing. PhD)
> > >> Investigador Asistente CONICET - Research Assistant CONICET
> > >> Prof. Inteligencia Artificial -UCC - Prof. Artificial
Intelligence @
> UCC
> > >> tel: +54-(0)351-4938000 int 145
> > >> Fax: +54-(0)351-4938081
> > >> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
> > >>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> > >> mail address: Camino Alta Gracia Km 7.1/2-
C??rdoba-5017-Argentina
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioconductor mailing list
> > >> Bioconductor at stat.math.ethz.ch
> > >>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >> Search the archives:
> > >>
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >>
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > >
https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> >
http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> >
> > ___________________________________________
> > Benjamin Otto, PhD
> > University Medical Center Hamburg-Eppendorf
> > Institute For Clinical Chemistry / Central Laboratories
> > Campus Forschung N27
> > Martinistr. 52,
> > D-20246 Hamburg
> >
> > Tel.: +49 40 7410 51908
> > Fax.: +49 40 7410 54971
> > ___________________________________________
> >
> >
> >
> >
> >
> > --
> > Pflichtangaben gem???? Gesetz ??ber elektronische Handelsregister
und
> > Genossenschaftsregister sowie das Unternehmensregister (EHUG):
> >
> > Universit??tsklinikum Hamburg-Eppendorf
> > K??rperschaft des ??ffentlichen Rechts
> > Gerichtsstand: Hamburg
> >
> > Vorstandsmitglieder:
> > Prof. Dr. J??rg F. Debatin (Vorsitzender)
> > Dr. Alexander Kirstein
> > Joachim Pr??l??
> > Prof. Dr. Dr. Uwe Koch-Gromus
> >
>
>
>
> --
> Elmer A. Fern??ndez (Bioing. PhD)
> Investigador Asistente CONICET - Research Assistant CONICET
> Prof. Inteligencia Artificial -UCC - Prof. Artificial Intelligence @
UCC
> tel: +54-(0)351-4938000 int 145
> Fax: +54-(0)351-4938081
> web page :
http://www.uccor.edu.ar/modelo.php?param=3.8.5.15
>
http://sites.google.com/site/biologicaldatamininggroup/Home/
> mail address: Camino Alta Gracia Km 7.1/2- C??rdoba-5017-Argentina
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
------------------------------
Message: 25
Date: Fri, 23 Jul 2010 09:13:56 +1000 (AUS Eastern Standard Time)
From: Gordon K Smyth <smyth@wehi.edu.au>
To: HuW at mskcc.org
Cc: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] the design matrix again
Message-ID: <pine.wnt.4.64.1007230912030.2728 at="" pc602.alpha.wehi.edu.au="">
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Looks correct.
Gordon
> Date: Tue, 20 Jul 2010 17:44:07 -0400
> From: HuW at mskcc.org
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] the design matrix again
>
>
> Hi everyone,
>
> I know my question is answered in some extent on mail list. But I am
> still not feel very confidence about my design. I really appreciate
if
> anyone can help me on this.
>
> the data set is about the patients before and after treatment. for
> example, for 3 patients. I want to find out the genes that changed
> expression before and after treatment. if I have 3 patients, I did
like
> this:
>
>> design
> patient1 patient2 patient3 treatment14
> 1 1 0 0 0
> 2 0 1 0 0
> 3 0 0 1 0
> 4 1 0 0 1
> 5 0 1 0 1
> 6 0 0 1 1
> attr(,"assign")
> [1] 1 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$patient
> [1] "contr.treatment"
>
> attr(,"contrasts")$treatment
> [1] "contr.treatment"
>
>> eset.rma.fit = lmFit(eset.rma, design);
>> eset.rma.bayes = eBayes(eset.rma.fit);
>> topTable(eset4.rma.bayes, coef = "treatment14", adjust = "BH");
>
> thank you very much.
>
> Wenhuo Hu
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}
------------------------------
Message: 26
Date: Thu, 22 Jul 2010 16:23:53 -0700
From: Thomas Girke <thomas.girke@ucr.edu>
To: Bioconductor mailing list <bioconductor at="" stat.math.ethz.ch="">,
bioc-sig-sequencing at stat.math.ethz.ch
Subject: [BioC] Open Postdoc Positions
Message-ID: <20100722232353.GA18501 at biocluster.ucr.edu>
Content-Type: text/plain; charset=us-ascii
Dear List Members,
There are currently two open postdoc positions in my group with
secured
funding for 3-4 years. One position is in the area of next generation
sequencing and the other one in the chemical informatics field related
to
chemical genomics and drug discovery. Both positions will involve a
combination
of software development and data analysis/mining tasks. Ideal
candidates should
have a strong background in computer sciences and scientific data
analysis, and
should be proficient in at least two of the following programming
languages:
C/C++, Python and R. Experience with web and database programming is
also
beneficial, especially with Python/Django and MySQL/PostgreSQL,
respectively.
To apply, please email your CV with a detailed description of your
professional
skills to thomas.girke at ucr.edu.
Thomas
--
Thomas Girke
Associate Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Institute for Integrative Genome Biology (IIGB)
1207F Genomics Building
University of California
Riverside, CA 92521
E-mail: thomas.girke at ucr.edu
Personal Site:
http://girke.bioinformatics.ucr.edu
Ph: 951-905-5232
Fax: 951-827-5155
------------------------------
Message: 27
Date: Fri, 23 Jul 2010 09:38:50 +0100
From: Heidi Dvinge <heidi@ebi.ac.uk>
To: David martin <vilanew at="" gmail.com="">
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] htQPCR
Message-ID: <c49c2983-a056-4db2-b43c-1d35f91a194e at="" ebi.ac.uk="">
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Hello David,
Thanks for the feedback on HTqPCR. I've never really thought of
filtering out samples during my own analysis, hence no option in
filterCtData. The default way is by doing subsetting, such as qPCRset
[,c(1:3,5)], or by using sample names as you do in your example.
However, I guess a specific filtering option might also be useful in
other cases, such as potentially removing samples that have a high
proportion of NA values and can therefore be considered failed plates/
samples.
I'll put it on the todo list of HTqPCR improvements.
CHeers
\Heidi
On 22 Jul 2010, at 10:47, David martin wrote:
> Hello,
> I would like to suggest a filtering method based on sample name.
> FilterCTdata contains a lot of filtering methods but didn't see any
> to filter based on sample names,
>
> Actually i use the match function do remove samples from the
analysis.
>
> e.g
> tofilter=c("sample1","sample2",...)
> exprs(qpcrObj)[,-match(tofilter,colnames(exprs(qpcrObj)))]
>
> thanks,
> david
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/
> gmane.science.biology.informatics.conductor
------------------------------
Message: 28
Date: Fri, 23 Jul 2010 10:11:47 +0100
From: Heidi Dvinge <heidi@ebi.ac.uk>
To: "Bass, Kevin" <bassk1 at="" email.chop.edu="">
Cc: BioC List <bioconductor at="" stat.math.ethz.ch="">
Subject: Re: [BioC] Problem with function limmaCtData in HTqPCR
package: "leading minor of order 2 is not positive
definite"
Message-ID: <12D85D00-CC61-4304-9112-7F870CA0A9D9 at ebi.ac.uk>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Hello Kevin,
On 21 Jul 2010, at 19:50, Bass, Kevin wrote:
> Hi,
>
> I am having a problem with using the function limmaCtData on a
qPCRset
> object created with the package HTqPCR. When I try to execute
> limmaCtData, I get the following error:
>
> "Error in chol.default(V) :
> the leading minor of order 2 is not positive definite"
>
as your traceback() shows, in the first step the error comes from
lmFit from the limma package. As I recall, it means that one of
internal design matrices has become singular. I'm afraid I don't
know exactly why this is happening, however it can be caused by
trying to do to much with too few observations/replicates. Is it
possible to use a smaller design matrix? Looking at your design
matrix it would appear that you have no replicates of either of the 5
treatments you list there. Based on your description of the
experiment, I'm not really sure whether this is the case or not?
By the way, it looks like you have quite a complex plate/sample
combination design compared to a standard qPCR analysis - I can see
we you end up with an object called "raw_monster2" after all the
different rbind and cbind ;)
Cheers
\Heidi
> Below, I will describe the experimental design and the steps taken
to
> create my qPCRset object. Then I will paste the commands used, and
> their results, in the steps leading up to running the limmaCtData
> function on my qPCRset object.
>
> We have 21 96-well plates. Each plate contains 5 experimental
groups
> and 4 genes--2 target genes, and 2 endogenous controls. Each
> experimental group sampled all 4 genes, and there were 3 biological
> replicates per sample, for a total of 12 wells per experimental
group.
>
> Every 7 plates among the total 21 plates constitutes a "set" of
> plates: they each contain the same 14 target genes. This means that
> each gene, in each experimental condition, has 3 samples among the
21
> plates--one sample per experimental condition for each 7-plate set.
>
> The goal is to compare the Ct values for each gene in each
> experimental group, to the Ct values for the same gene in every
other
> experimental group.
>
> Using rbind (HTqPCR), I collated 7 of the data files into one file,
> so that all 14 genes could be analyzed simultaneously, at least
among
> a single set of plates--once I had figured that part out, I had
> planned on combining the 3 sets.
>
> To give a clear idea what my data looks like--and how it was
> implemented in my qPCRset object--this is the Slot "history" and
Slot
> "exprs" of my combined qPCRset object (with the data removed):
>
> Slot "exprs":
> 01_veh+FA 02_low+FA 03_mid+FA 04_high+FA
> 05_no_treatment
> PGES
> PGES
> PGES
> c-Fos
> c-Fos
> c-Fos
> SPP1
> SPP1
> SPP1
> CD200
> CD200
> CD200
> COX-1
> COX-1
> COX-1
> COX-2
> COX-2
> COX-2
> OX-42
> OX-42
> OX-42
> iBA-1
> iBA-1
> iBA-1
> IL-2
> IL-2
> IL-2
> IL-4
> IL-4
> IL-4
> IL-6
> IL-6
> IL-6
> IL-8
> IL-8
> IL-8
> IL-10
> IL-10
> IL-10
> CD4
> CD4
> CD4
>
> Slot "history":
> history
> 1 raw8: readCtData(files = "NS398_08b.txt", path = barrPath,
> n.features = 12,
> 2 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 3 header = TRUE, n.data = 5)
> 4 raw9: readCtData(files = "NS398_09b.txt", path = barrPath,
> n.features = 12,
> 5 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 6 header = TRUE, n.data = 5)
> 7 raw10: readCtData(files = "NS398_10b.txt", path = barrPath,
> n.features = 12,
> 8 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 9 header = TRUE, n.data = 5)
> 10 raw11: readCtData(files = "NS398_11b.txt", path = barrPath,
> n.features = 12,
> 11 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 12 header = TRUE, n.data = 5)
> 13 raw12: readCtData(files = "NS398_12b.txt", path = barrPath,
> n.features = 12,
> 14 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 15 header = TRUE, n.data = 5)
> 16 raw13: readCtData(files = "NS398_13b.txt", path = barrPath,
> n.features = 12,
> 17 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 18 header = TRUE, n.data = 5)
> 19 raw14: readCtData(files = "NS398_14b.txt", path = barrPath,
> n.features = 12,
> 20 flag = NULL, feature = 5, type = 7, position = 2, Ct = 6,
> 21 header = TRUE, n.data = 5)
> 22 rbind(deparse.level, ..1, ..2, ..3, ..4, ..5, ..6, ..7)
> 23 normalizeCtData(q = raw_monster2, norm = "deltaCt",
> deltaCt.genes = "GAPDH")
> 24 filterCtDataNew(q = d.raw2, remove.type = "Endogenous
> Control")
> 25 setCategory(q = fd.raw2, Ct.max = 100, Ct.min = 0,
> quantile = 0.9,
>
> So, then I prepared the matrix for analysis with limma:
>
>> design<-model.matrix(~0+sampleNames(test.d.raw2))
> Warning message:
> In model.matrix.default(~0 + sampleNames(test.d.raw2)) :
> variable 'sampleNames(test.d.raw2)' converted to a factor
>> colnames(design)<-c("VehFA","LowFA","MidFA","HighFA","NoTreat")
>> print(design)
> VehFA LowFA MidFA HighFA NoTreat
> 1 1 0 0 0 0
> 2 0 1 0 0 0
> 3 0 0 1 0 0
> 4 0 0 0 1 0
> 5 0 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$`sampleNames(test.d.raw2)`
> [1] "contr.treatment"
>> contrasts<-makeContrasts(VehFA-LowFA, VehFA-MidFA, VehFA-HighFA,
> + VehFA-NoTreat, LowFA-MidFA, LowFA-HighFA, LowFA-NoTreat,
> + MidFA-HighFA, MidFA-NoTreat,HighFA-NoTreat, levels=design)
>> colnames(contrasts)<-c("V-L", "V-M", "V-H", "V-NT", "L-M", "L-H",
> + "L-NT", "M-H", "M-NT", "H-NT")
>> print(contrasts)
> Contrasts
> Levels V-L V-M V-H V-NT L-M L-H L-NT M-H M-NT H-NT
> VehFA 1 1 1 1 0 0 0 0 0 0
> LowFA -1 0 0 0 1 1 1 0 0 0
> MidFA 0 -1 0 0 -1 0 0 1 1 0
> HighFA 0 0 -1 0 0 -1 0 -1 0 1
> NoTreat 0 0 0 -1 0 0 -1 0 -1 -1
>> test.d.raw2b<-test.d.raw2[order(featureNames(test.d.raw2)), ]
>
======================================================================
> =
>> qDE.limma <- limmaCtData(test.d.raw2b,design=design,
> + contrasts=contrasts,ndups=3,spacing=1)
> Error in chol.default(V) :
> the leading minor of order 2 is not positive definite
> In addition: Warning message:
> In sqrt(dfitted.values) : NaNs produced
>> traceback()
> 6: .Call("La_chol", as.matrix(x), PACKAGE = "base")
> 5: chol.default(V)
> 4: chol(V)
> 3: gls.series(y$exprs, design = design, ndups = ndups,
> spacing = spacing, block = block, correlation = correlation,
> weights = weights, ...)
> 2: lmFit(data, design = design, ndups = ndups, spacing = spacing,
> correlation = dup.cor$consensus, ...)
> 1: limmaCtData(test.d.raw2b, design = design, contrasts = contrasts,
> ndups = 3, spacing = 1)
>
> Any ideas on why I am getting this error and what I might do to
avoid
> it? If there is any other information needed, please let me know.
>
> Thanks,
> Kevin
> bassk1 at email.chop.edu
>
>
>
> =====
>
> Kevin Bass, Research Technician
> Barr Lab
> Children's Hospital of Philadelphia
> Abramson Research Center
> 3615 Civic Center Blvd, Suite 714
> Philadelphia PA 19104-4399
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
>
https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/
> gmane.science.biology.informatics.conductor
------------------------------
Message: 29
Date: Fri, 23 Jul 2010 05:50:49 -0400
From: Vincent Carey <stvjc@channing.harvard.edu>
To: bioconductor <bioconductor at="" stat.math.ethz.ch="">
Subject: [BioC] building a refseq-based transcriptDb: warnings of
interest?
Message-ID:
<aanlktikxjh9dbszeynccwst2hj15sbohmbnd8e46m5_- at="" mail.gmail.com="">
Content-Type: text/plain; charset=ISO-8859-1
> hg18r.txdb = makeTranscriptDbFromUCSC(tablename="refGene")
Download the refGene table ... OK
Download the refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]], ... :
UCSC data anomaly in transcript NM_017940: the cds cumulative length
is not a multiple of 3
2: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]], ... :
UCSC data anomaly in transcript NM_001037675: the cds cumulative
length is not a multiple of 3
3: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]], ... :
UCSC data anomaly in transcript NM_001039703: the cds cumulative
length is not a multiple of 3
4: In .extractUCSCCdsStartEnd(cdsStart[i], cdsEnd[i],
exon_locs$start[[i]], ... :
and so on. Does this need to be reported to UCSC?
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils
methods
[8] base
other attached packages:
[1] GenomicFeatures_1.1.6 GenomicRanges_1.1.15 IRanges_1.7.13
[4] weaver_1.15.0 codetools_0.2-2 digest_0.4.2
loaded via a namespace (and not attached):
[1] BSgenome_1.17.5 Biobase_2.9.0 Biostrings_2.17.26 DBI_0.2-5
[5] RCurl_1.4-2 RSQLite_0.9-1 XML_3.1-0
biomaRt_2.5.1
[9] rtracklayer_1.9.3
------------------------------
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
End of Bioconductor Digest, Vol 89, Issue 22
********************************************
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager (it.support at cancer.ucl.ac.uk).
**********************************************************************