Dear List,
I guess this is a pretty basic question, but I have not found an
answer in the
archives. I would like to take a matrix representing m hybridizations
and
sort it to extract the 2000 genes with the maximal fold change as a
preliminary step of a further analysis. Is it possible to use the
geneFilter
to extract a fixed number of the "top" genes as above rather than
genes
satisfying some fixed criterion?
Thanks
Peter
The easiet way is as follows for Affymetrix data.
Load and analyse data
data<-ReadAffy()
eset<-rma(data)
exp<-eset@exprs
Calculate fold change (m) values for 2 chips
m<-exp[,1]-exp[,1]
Ask for top up and down regulated genes
X<-names(m[abs(m)>2])
Keep adjusting m cut off (ie 2 in above example) to get desired number
of genes
Filter data
exp.subset<-exp[X,]
Alternatively Sort on fold change
m.sort <- sort(m,dec=T)
top.genes <- m.sort[1:1000]
exp.subset<-exp[top.genes,]
Cheers
Anthony
--
______________________________________________
Anthony Bosco - PhD Student
Institute for Child Health Research
(Company Limited by Guarantee ACN 009 278 755)
Subiaco, Western Australia, 6008
Ph 61 8 9489 , Fax 61 8 9489 7700
email anthonyb@ichr.uwa.edu.au
______________________________________________
[[alternative HTML version deleted]]
On Mon, 2004-06-28 at 07:47, Anthony Bosco wrote:
> The easiet way is as follows for Affymetrix data.
>
> Load and analyse data
>
> data<-ReadAffy()
>
> eset<-rma(data)
>
> exp<-eset@exprs
> Calculate fold change (m) values for 2 chips
>
> m<-exp[,1]-exp[,1]
Err, isn't this equivalent to m <- x - x which always produce 0.
Besides
I think the definition of Fold Change is the difference of the group
_means_.
Using exp is bad idea as it is also an R function, see help("exp").
> Ask for top up and down regulated genes
>
> X<-names(m[abs(m)>2])
Your solution can produce spurious results. You should surround the
abs(m) > 2 with which(). See below :
> x <- 1:5
> names(x) <- letters[1:5]
> x
a b c d e
1 2 3 4 5
>
> names( x > 2.5 )
[1] "a" "b" "c" "d" "e"
>
> names( which( x > 2.5 ) )
[1] "c" "d" "e"
Note : There is the issue of translating from log2 base if you use
RMA.
So a FC of 2 would correspond to log2(FC) of 1.
> Keep adjusting m cut off (ie 2 in above example) to get desired
number of genes
>
> Filter data
>
> exp.subset<-exp[X,]
>
> Alternatively Sort on fold change
>
> m.sort <- sort(m,dec=T)
> top.genes <- m.sort[1:1000]
>
> exp.subset<-exp[top.genes,]
It is much easier and faster to use indices than matching rownames.
data <- matrix( rnorm(100000), nc=10 )
rownames(data) <- paste("g", 1:10000, sep="")
> w <- which(abs(rowMeans(data)) > 0.5)
> names.w <- rownames(data)[w]
length(w); length(names.w)
[1] 1173
[1] 1173
> system.time( sub1 <- data[ w, ] )
[1] 0 0 0 0 0
> system.time( sub1 <- data[ names.w, ] )
[1] 0.47 0.00 0.48 0.00 0.00
On Sun, Jun 27, 2004 at 09:19:33PM +0200, peter robinson wrote:
> Dear List,
>
> I guess this is a pretty basic question, but I have not found an
answer in the
> archives. I would like to take a matrix representing m
hybridizations and
> sort it to extract the 2000 genes with the maximal fold change as a
> preliminary step of a further analysis. Is it possible to use the
geneFilter
> to extract a fixed number of the "top" genes as above rather than
genes
> satisfying some fixed criterion?
No. Genefilter operates on a gene-by-gene basis (at least when I last
looked at the function). There are (of course) other ways of doing it.
--
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen