Entering edit mode
pankaj borah
▴
120
@pankaj-borah-3804
Last seen 10.2 years ago
Hi,
Thanks for the previous suggestion. I am wondering how to write only
the $R values to a output matrix? Here we have both $R and $P values.
Regards,
Pankaj Barah Department of Biology,
Norwegian University of Science & Technology (NTNU)
Realfagbygget, N-7491 Trondheim, Norway
--- On Thu, 19/11/09, Robert Castelo <robert.castelo@upf.edu> wrote:
From: Robert Castelo <robert.castelo@upf.edu>
Subject: Re: [BioC] genereating correlation matrix from gene
expression data
To: "pankaj borah" <pankajborah2k3@yahoo.co.in>
Cc: bioconductor@stat.math.ethz.ch
Date: Thursday, 19 November, 2009, 2:33 PM
Pankaj,
in addition to what Steve provided below, if you are dealing
with
expression data, it might be useful for you the function qpPCC from
the
bioconductor package 'qpgraph'. it performs the same calculation as
the
'cor' function illustrated in Steve's message but in addition it
accepts
as input parameter not only a matrix or a data frame but also an
ExpressionSet object (not a big deal) and, more interestingly, it
calculates for you the P-values of a two sided test for the null
hypothesis of zero Pearson correlation. this latter feature might be
useful since as far as i know if one wants to test every pair of genes
and get the P-values one needs to call 'cor.test' for each pair of
genes
within a two nested for loops or some 'apply' call and this function
'qpPCC' does it using matrix calculations and thus is faster for that
purpose. of course, if you are interested in determining significance
through the P-values you will have to deal with the multiple
testing
problem.
library(qpgraph)
set.seed(123)
x <- matrix(rnorm(20), 4, 5)
round(x, digits=3)
[,1] [,2] [,3] [,4] [,5]
[1,] -0.560 0.129 -0.687 0.401 0.498
[2,] -0.230 1.715 -0.446 0.111 -1.967
[3,] 1.559 0.461 1.224 -0.556 0.701
[4,] 0.071 -1.265 0.360 1.787 -0.473
lapply(qpPCC(x), round, digits=3)
$R
1 2 3 4 5
1 1.000 -0.016 0.958 -0.490 0.437
2 -0.016 1.000 -0.271 -0.753 -0.462
3 0.958 -0.271 1.000 -0.218 0.431
4 -0.490 -0.753 -0.218 1.000 -0.198
5 0.437 -0.462 0.431 -0.198 1.000
$P
1
2 3 4 5
1 0.000 0.984 0.042 0.510 0.563
2 0.984 0.000 0.729 0.247 0.538
3 0.042 0.729 0.000 0.782 0.569
4 0.510 0.247 0.782 0.000 0.802
5 0.563 0.538 0.569 0.802 0.000
cor.test(x[,1], x[,2])
Pearson's product-moment correlation
data: x[, 1] and x[, 2]
t = -0.0231, df = 2, p-value = 0.9837
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.962 0.960
sample estimates:
cor
-0.0163
as a final shameless remark :) if you are interested in trying to sort
out genuine from spurious correlations you might want to calculate the
so-called "partial correlations" (see
http://en.wikipedia.org/wiki/Partial_correlation) which in
fact one
cannot calculate with gene expression data because typically you will
many more genes than samples. instead, as a sort of surrogate for
partial correlations, you might find useful to calculate the so-called
non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from
the
'qpgraph' package (look at the corresponding help pages if you're
interested).
best,
robert.
On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote:
> Hi Pankaj,
>
> On Nov 18, 2009, at 1:30 PM, pankaj borah wrote:
>
> > Hi all,
> >
> > i am a new user of R and Bioconductor package. i deal with gene
> > expression data. i was wondering if there is a way to generate a
> > correlation matrix (all to all square symmetric matrix) for a set
> > of genes and their expression values. is there any available
function?
>
>
Try ?cor to get the pairwise correlation between columns of your
> matrix, eg:
>
> R> x <- matrix(rnorm(20), 4, 5)
> R> x
> [,1] [,2] [,3] [,4] [,5]
> [1,] 0.5818808 0.58781709 2.5511157 -1.5332180 0.6905731
> [2,] -0.7640311 0.25960974 0.7246655 -1.5539226 -0.5459625
> [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804 0.4629494
> [4,] -2.2906217 0.04932864 0.5694895 0.7736206 0.4078665
>
> R> cor(x)
> [,1] [,2] [,3] [,4] [,5]
> [1,]
1.00000000 0.851982381 0.8715055 -0.8404848 0.074770380
> [2,] 0.85198238 1.000000000 0.9429553 -0.5338964 0.004627258
> [3,] 0.87150545 0.942955253 1.0000000 -0.4719936 0.325584972
> [4,] -0.84048477 -0.533896414 -0.4719936 1.0000000 0.309414781
> [5,] 0.07477038 0.004627258 0.3255850 0.3094148 1.000000000
>
> If your genes are in rows, you'll just need to pass in the transpose
> of your matrix.
>
> HTH,
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
_______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[elided Yahoo spam]]
[[elided Yahoo spam]]
[[alternative HTML version deleted]]