Hello
I am a beginner at bioconductor and R. I have a confussion about how
to do a normalization which consist of obtain the mean of a column,
and
then substract the mean of the column to each value in the column.
x1(1)- mean(col x1) x2(1)- mean(col x2)
x1(2)- mean(col x1) x2(2)- mean(col x2)
x1(3)- mean(col x1) x2(3)- mean(col x2)
.................... ...................
I have the genes in columns and the conditions in rows.
I don't want to stabilize the variance.
As you can see is a very simple calculation.
I am wondering if could use packages like vsn or affy to do that or
is
more easy to write a script.
Futhermore, I have a doubt if such simple normalization is
conceptually correct whith the objetive of eliminate the effect
between
array.
I would to know if I have to iterate any numbers of times the
process
o f calculate the mean of each column and substract the mean.
Thank you
diego lugro
studient
universidad de buenos aires
argentina
On 26 May 2005, at 07:12, diego huck wrote:
>
> Hello
>
> I am a beginner at bioconductor and R. I have a confussion about
how
> to do a normalization which consist of obtain the mean of a column,
> and then substract the mean of the column to each value in the
column.
> x1(1)- mean(col x1) x2(1)- mean(col x2)
> x1(2)- mean(col x1) x2(2)- mean(col x2)
> x1(3)- mean(col x1) x2(3)- mean(col x2)
> .................... ...................
>
>
> I have the genes in columns and the conditions in rows.
That is fine, although unusual. Be aware that many of the BioC (and
similar) microarray packages use a rows=genes, columns=samples
convention. Although this perhaps wouldn't be the way a statistician
would arrange subjects and measurements in a table in R, I think it is
partly a historical carry-over from microarray data analysis in
spreadsheets and the like. Excel has a 256 column x 65000(ish) row
size limit, so you are pretty much stuck with one layout!
If you ever need to rotate your data then this is easy: use the t()
function.
newArray <- t(oldArray)
> I don't want to stabilize the variance.
If you did, the vsn package will do this.
> As you can see is a very simple calculation.
> I am wondering if could use packages like vsn or affy to do that or
> is more easy to write a script.
You can do this yourself very easy, as this code snippet shows:
# Make a spoof array of 100 genes and 20 samples to demonstrate
x <- matrix(runif(2000), ncol=100)
# Calculate the mean of each column. Note: you could us median
here
to make it slightly more robust
colMeans <- apply(x, 2, mean)
# Subtrate the column means from each value in that column
x <- sweep(x, 2, colMeans, "-")
# You can do a similar version to subtrate the row means;
simply
change the second value of both apply() and sweep() to "1".
# Alternatively, if you wanted to do division as opposed to
subtraction
use
x <- sweep(x, 2, colMeans, "/")
> Futhermore, I have a doubt if such simple normalization is
> conceptually correct whith the objetive of eliminate the effect
> between array.
> I would to know if I have to iterate any numbers of times the
process
> o f calculate the mean of each column and substract the mean.
>
Subtracting the mean from each column will make the new mean of each
column zero, so one cycle is enough.
Hope this helps.
David
Prof David Kipling
Department of Pathology
School of Medicine
Cardiff University
Heath Park
Cardiff CF14 4XN
Tel: 029 2074 4847
Email: KiplingD@cardiff.ac.uk
Thank you David, this commands were very useful.
Thank you Gordon for your comments, I?ll go to see the again the
statistics theory.
best regards
diego
David Kipling wrote:
>
>
> On 26 May 2005, at 07:12, diego huck wrote:
>
>>
>> Hello
>>
>> I am a beginner at bioconductor and R. I have a confussion about
how
>> to do a normalization which consist of obtain the mean of a column,
>> and then substract the mean of the column to each value in the
column.
>> x1(1)- mean(col x1) x2(1)- mean(col x2)
>> x1(2)- mean(col x1) x2(2)- mean(col x2)
>> x1(3)- mean(col x1) x2(3)- mean(col x2)
>> .................... ...................
>>
>>
>> I have the genes in columns and the conditions in rows.
>
>
> That is fine, although unusual. Be aware that many of the BioC (and
> similar) microarray packages use a rows=genes, columns=samples
> convention. Although this perhaps wouldn't be the way a
statistician
> would arrange subjects and measurements in a table in R, I think it
is
> partly a historical carry-over from microarray data analysis in
> spreadsheets and the like. Excel has a 256 column x 65000(ish) row
size
> limit, so you are pretty much stuck with one layout!
>
> If you ever need to rotate your data then this is easy: use the t()
> function.
>
> newArray <- t(oldArray)
>
>
>> I don't want to stabilize the variance.
>
>
> If you did, the vsn package will do this.
>
>> As you can see is a very simple calculation.
>> I am wondering if could use packages like vsn or affy to do that
or
>> is more easy to write a script.
>
>
> You can do this yourself very easy, as this code snippet shows:
>
>
> # Make a spoof array of 100 genes and 20 samples to demonstrate
> x <- matrix(runif(2000), ncol=100)
>
> # Calculate the mean of each column. Note: you could us median
here
> to make it slightly more robust
> colMeans <- apply(x, 2, mean)
>
> # Subtrate the column means from each value in that column
> x <- sweep(x, 2, colMeans, "-")
>
> # You can do a similar version to subtrate the row means; simply
> change the second value of both apply() and sweep() to "1".
> # Alternatively, if you wanted to do division as opposed to
> subtraction use
> x <- sweep(x, 2, colMeans, "/")
>
>
>> Futhermore, I have a doubt if such simple normalization is
>> conceptually correct whith the objetive of eliminate the effect
>> between array.
>> I would to know if I have to iterate any numbers of times the
process
>> o f calculate the mean of each column and substract the mean.
>>
>
> Subtracting the mean from each column will make the new mean of each
> column zero, so one cycle is enough.
>
> Hope this helps.
>
> David
>
> Prof David Kipling
> Department of Pathology
> School of Medicine
> Cardiff University
> Heath Park
> Cardiff CF14 4XN
>
> Tel: 029 2074 4847
> Email: KiplingD@cardiff.ac.uk
>
>
> Date: Thu, 26 May 2005 03:12:41 -0300
> From: diego huck <diegolugro@yahoo.com.ar>
> Subject: [BioC] how to normalize by columns
> To: bioconductor@stat.math.ethz.ch
> Message-ID: <429568D9.1050308@yahoo.com.ar>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
>
> Hello
>
> I am a beginner at bioconductor and R. I have a confussion about
how
> to do a normalization which consist of obtain the mean of a column,
and
> then substract the mean of the column to each value in the column.
> x1(1)- mean(col x1) x2(1)- mean(col x2)
> x1(2)- mean(col x1) x2(2)- mean(col x2)
> x1(3)- mean(col x1) x2(3)- mean(col x2)
> .................... ...................
> I have the genes in columns and the conditions in rows.
If you were subtracting condition means, then this would be similar to
method="median" of the
normalizeWithinArrays() function in the limma package. However,
subtracting genewise means is not
likely to be a useful normalization method for any sort of expression
data.
> I don't want to stabilize the variance.
> As you can see is a very simple calculation.
> I am wondering if could use packages like vsn or affy to do that
or is
> more easy to write a script.
> Futhermore, I have a doubt if such simple normalization is
> conceptually correct whith the objetive of eliminate the effect
between
> array.
If you don't think it's right, why do it? Why not do use one of the
methods provided with a
proven track record? If you want suggestions from BioC people, you
could start by explaining
exactly what your data is -- microarray, PCR, one channel, two
channel, log-expression,
log-ratios??
Gordon
> I would to know if I have to iterate any numbers of times the
process
> o f calculate the mean of each column and substract the mean.
>
> Thank you
>
> diego lugro
> studient
> universidad de buenos aires
> argentina