Hi I was wondering how I can figure out which genes correspond to each
of the principle components ($cmdscale.out). Gene.selection is set to
pairwise for the top 500 but I'd like to know exactly which genes are
being considered for cmdscale.out[,3], for example.
Thanks for the help!
.kripa
[[alternative HTML version deleted]]
Hi,
On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777 at="" hotmail.com="">
wrote:
>
> Hi I was wondering how I can figure out which genes correspond to
each of the principle components ($cmdscale.out). Gene.selection is
set to pairwise for the top 500 but I'd like to know exactly which
genes are being considered for cmdscale.out[,3], for example.
Perhaps you'd like to tell us what you did, exactly?
without some reproducible code examples, we're left guessing as to
what commands you've run and where `$cmdscale.out` comes from, etc.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
All genes contribute to all principle components; some just more than
others. I've found it useful (or perhaps just entertaining) to see
heatmaps of the first N (say 10) component loadings, constrained to
the 500
genes whose variances contribute most to those components.
-Aaron
On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote:
>
> Hi I was wondering how I can figure out which genes correspond to
each of
> the principle components ($cmdscale.out). Gene.selection is set to
pairwise
> for the top 500 but I'd like to know exactly which genes are being
> considered for cmdscale.out[,3], for example.
>
> Thanks for the help!
>
> .kripa
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Maybe this is very basic, but how do you identify those 500 genes for
the heatmap?
.kripa
From: amackey@virginia.edu
Date: Mon, 7 May 2012 11:47:16 -0400
Subject: Re: [BioC] LIMMA: plotMDS
To: kripa777@hotmail.com
CC: bioconductor@r-project.org
All genes contribute to all principle components; some just more than
others. I've found it useful (or perhaps just entertaining) to see
heatmaps of the first N (say 10) component loadings, constrained to
the 500 genes whose variances contribute most to those components.
-Aaron
On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com> wrote:
Hi I was wondering how I can figure out which genes correspond to each
of the principle components ($cmdscale.out). Gene.selection is set to
pairwise for the top 500 but I'd like to know exactly which genes are
being considered for cmdscale.out[,3], for example.
Thanks for the help!
.kripa
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]
try using the 'sparcl' package or something like this:
http://www.biomedcentral.com/1471-2105/11/296
the point is to threshold vigorously enough that most of the loadings
are
forced to zero
On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777@hotmail.com> wrote:
>
> Maybe this is very basic, but how do you identify those 500 genes
for the
> heatmap?
>
> .kripa
>
> From: amackey@virginia.edu
> Date: Mon, 7 May 2012 11:47:16 -0400
> Subject: Re: [BioC] LIMMA: plotMDS
> To: kripa777@hotmail.com
> CC: bioconductor@r-project.org
>
> All genes contribute to all principle components; some just more
than
> others. I've found it useful (or perhaps just entertaining) to see
> heatmaps of the first N (say 10) component loadings, constrained to
the 500
> genes whose variances contribute most to those components.
>
>
> -Aaron
>
> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com>
wrote:
>
>
>
>
> Hi I was wondering how I can figure out which genes correspond to
each of
> the principle components ($cmdscale.out). Gene.selection is set to
pairwise
> for the top 500 but I'd like to know exactly which genes are being
> considered for cmdscale.out[,3], for example.
>
>
>
>
>
> Thanks for the help!
>
>
>
> .kripa
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
>
> Bioconductor mailing list
>
> Bioconductor@r-project.org
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMetho
ds/inst/doc/pcaMethods.pdf<http: www.bioconductor.org="" packages="" 2.11="" b="" ioc="" vignettes="" pcamethods="" inst="" doc="" pcamethods.pdf="">
if you are willing to simply truncate loadings at some arbitrary value
(not
always the worst idea of all time mind you)
superPC and the preconditioned lasso turn this strategy into a method
for
obtaining sparse classifiers, FWIW.
On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr.
<tim.triche@gmail.com>wrote:
> try using the 'sparcl' package or something like this:
> http://www.biomedcentral.com/1471-2105/11/296
>
> the point is to threshold vigorously enough that most of the
loadings are
> forced to zero
>
>
>
> On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777@hotmail.com>
wrote:
>
>>
>> Maybe this is very basic, but how do you identify those 500 genes
for the
>> heatmap?
>>
>> .kripa
>>
>> From: amackey@virginia.edu
>> Date: Mon, 7 May 2012 11:47:16 -0400
>> Subject: Re: [BioC] LIMMA: plotMDS
>> To: kripa777@hotmail.com
>> CC: bioconductor@r-project.org
>>
>> All genes contribute to all principle components; some just more
than
>> others. I've found it useful (or perhaps just entertaining) to see
>> heatmaps of the first N (say 10) component loadings, constrained to
the 500
>> genes whose variances contribute most to those components.
>>
>>
>> -Aaron
>>
>> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777@hotmail.com>
wrote:
>>
>>
>>
>>
>> Hi I was wondering how I can figure out which genes correspond to
each of
>> the principle components ($cmdscale.out). Gene.selection is set to
pairwise
>> for the top 500 but I'd like to know exactly which genes are being
>> considered for cmdscale.out[,3], for example.
>>
>>
>>
>>
>>
>> Thanks for the help!
>>
>>
>>
>> .kripa
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>>
>> Bioconductor mailing list
>>
>> Bioconductor@r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
Hi Kripa,
It is a good question to ask which the top 500 genes are. We only can
know that if "gene.selection" is "common", in which situation, the
same top 500 genes were used for calculating the distance matrix among
all samples. When "gene.selection" is "pairwise", different sets of
500 genes for different pairs of samples are used for distance matrix
calculation.
If you look at "plotMDS.default" in limma, you will see the above
description is consistent with the code. "plotMDS.default" is used by
plotMDS.
The current code does not output the top-500-gene list. However, the
code can be edited to output the list when "gene.selection" is
"common". Please see the following.
In the followed example, "mds4[[2]]" is the vector of indexes for the
top 500 genes.
Hope this help.
Di
plotMDS.default<-
function (x, top = 500, labels = colnames(x), col = NULL, cex = 1,
dim.plot = c(1, 2), ndim = max(dim.plot), gene.selection =
"pairwise",
xlab = paste("Dimension", dim.plot[1]), ylab = paste("Dimension",
dim.plot[2]), ...)
{
x <- as.matrix(x)
ok <- is.finite(x)
if (!all(ok))
x <- x[apply(ok, 1, all), ]
if (is.null(labels))
labels <- 1:dim(x)[2]
nprobes <- nrow(x)
nsamples <- ncol(x)
if (ndim < 2)
stop("Need at least two dim.plot")
if (nsamples < ndim)
stop("Two few samples")
gene.selection <- match.arg(gene.selection, c("pairwise",
"common"))
cn <- colnames(x)
dd <- matrix(0, nrow = nsamples, ncol = nsamples, dimnames =
list(cn,
cn))
topindex <- nprobes - top + 1
if (gene.selection == "pairwise") {
for (i in 2:(nsamples)) for (j in 1:(i - 1)) dd[i, j] =
sqrt(meansort.int((x[,
i] - x[, j])^2, partial = topindex)[topindex:nprobes]))
}
else {
# Same genes used for all comparisons ,"common"
s <- rowMeans((x - rowMeans(x))^2)
q <- quantile(s, p = (topindex - 1.5)/(nprobes - 1))
x <- x[s >= q, ]
# an extra line
ind.top.genes<-which(s >= q)
for (i in 2:(nsamples)) dd[i, 1:(i - 1)] = sqrt(colMeans((x[,
i] - x[, 1:(i - 1), drop = FALSE])^2))
}
a1 <- cmdscale(as.dist(dd), k = ndim)
mds <- new("MDS", list(dim.plot = dim.plot, distance.matrix = dd,
cmdscale.out = a1, top = top, gene.selection =
gene.selection))
mds$x <- a1[, dim.plot[1]]
mds$y <- a1[, dim.plot[2]]
mdsPlot<-plotMDS(mds, labels = labels, col = col, cex = cex, xlab
= xlab,
ylab = ylab, ...)
list (mds=mds, ind.top.genes=ind.top.genes)
}
# The example from "plotMDS" documentation
sd <- 0.3*sqrt(4/rchisq(1000,df=4))
x <- matrix(rnorm(1000*6,sd=sd),1000,6)
rownames(x) <- paste("Gene",1:1000)
x[1:50,4:6] <- x[1:50,4:6] + 2
# without labels, indexes of samples are plotted.
mds <- plotMDS(x, col=c(rep("black",3), rep("red",3)) )
mds4<-plotMDS.default(x, col=c(rep("black",3), rep("red",3)) ,
gene.selection="common")
> mds4[[1]]
> length(mds4[[2]])
[1] 500
----
Di Wu
Postdoctoral fellow
Harvard University, Statistics Department
Harvard Medical School
Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA
________________________________________
From: bioconductor-bounces@r-project.org [bioconductor-
bounces@r-project.org] On Behalf Of Tim Triche, Jr.
[tim.triche@gmail.com]
Sent: Monday, May 07, 2012 1:48 PM
To: Kripa R
Cc: amackey at virginia.edu; bioconductor at r-project.org
Subject: Re: [BioC] LIMMA: plotMDS
also http://www.bioconductor.org/packages/2.11/bioc/vignettes/pcaMetho
ds/inst/doc/pcaMethods.pdf<http: www.bioconductor.org="" packages="" 2.11="" b="" ioc="" vignettes="" pcamethods="" inst="" doc="" pcamethods.pdf="">
if you are willing to simply truncate loadings at some arbitrary value
(not
always the worst idea of all time mind you)
superPC and the preconditioned lasso turn this strategy into a method
for
obtaining sparse classifiers, FWIW.
On Mon, May 7, 2012 at 10:46 AM, Tim Triche, Jr. <tim.triche at="" gmail.com="">wrote:
> try using the 'sparcl' package or something like this:
> http://www.biomedcentral.com/1471-2105/11/296
>
> the point is to threshold vigorously enough that most of the
loadings are
> forced to zero
>
>
>
> On Mon, May 7, 2012 at 10:16 AM, Kripa R <kripa777 at="" hotmail.com="">
wrote:
>
>>
>> Maybe this is very basic, but how do you identify those 500 genes
for the
>> heatmap?
>>
>> .kripa
>>
>> From: amackey at virginia.edu
>> Date: Mon, 7 May 2012 11:47:16 -0400
>> Subject: Re: [BioC] LIMMA: plotMDS
>> To: kripa777 at hotmail.com
>> CC: bioconductor at r-project.org
>>
>> All genes contribute to all principle components; some just more
than
>> others. I've found it useful (or perhaps just entertaining) to see
>> heatmaps of the first N (say 10) component loadings, constrained to
the 500
>> genes whose variances contribute most to those components.
>>
>>
>> -Aaron
>>
>> On Sun, May 6, 2012 at 7:21 PM, Kripa R <kripa777 at="" hotmail.com="">
wrote:
>>
>>
>>
>>
>> Hi I was wondering how I can figure out which genes correspond to
each of
>> the principle components ($cmdscale.out). Gene.selection is set to
pairwise
>> for the top 500 but I'd like to know exactly which genes are being
>> considered for cmdscale.out[,3], for example.
>>
>>
>>
>>
>>
>> Thanks for the help!
>>
>>
>>
>> .kripa
>>
>> [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>>
>> Bioconductor mailing list
>>
>> Bioconductor at r-project.org
>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
>
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard
Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf="">
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Just to be clear on terminology, MDS computes principal coordinates,
which
are not quite the same as principal components.
Regardless of the name however, there actually is no set of genes that
corresponds to each dimension. The "pairwise" method in plotMDS()
uses a
potentially different set of genes for every pair of samples, and all
these pairwise sets of genes feed to some extent into all the
dimensions.