Estimates of RMA parameters

0

Entering edit mode

@antillenicolaslausannenrc-bas-337

Last seen 10.2 years ago

Hi! During background subtraction (with RMA), it is assumed that the the observed intensities (O) are the sum of a signal(S) and a noise(N) : O = S + N with S follows an exponential ditribution (parameter = alpha) and N follows a normal distribution (parameters = mu and sigma). I've tried to assess the quality of these parameters estimates by simulation. For that, I've chosen the values of the three parameters and I've generated a large data set (N values): dataset <- rexp(N,alpha) + rnorm(N,mu,sigma) After that, i've tried to estimate the values of the parameters like in RMA: bg.parameters(dataset, 2^12) and I've been surprised to discover that the estimates weren't good at all For example, with alpha=0.0005, mu=500, lambda=75 and N=50000, I've obtained the following estimates: alpha = 0.0029 mu = 776.64 sigma = 251.14 This encouraged me to find another way to estimate the parameters. I've found it, but I need your opinion: First, I've computed the theoretical distribution of O (convolution of S and N). I've found the following expression: f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu - alpha*sigma^2)/(sigma)) where phi is the cumulative distribution function of the normal distribution. Then I've computed the corresponding log-likelihood function. The computation of maximum log-likelihood is possible but difficult. That's why I propose to solve it in a different way. We have: E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.) mean(intensities) and Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 = (appr.) var(intensities) where we suppose that S and N are independant and where intensities is the vector of intensities (perfect match). Then I suggest to make the following substitutions in the log- likelihood function: mu ----> mean(intensities) - 1/alpha sigma^2 ----> var(intensities) - 1/alpha^2 At this point, we have to maximize the log-likelihood which contains just one unknown parameter : alpha For the moment, I suggest to find the maximum empirically (compute the function for a large range of alpha) In the example, I've computed the function for 5000 values of alpha randomly selected between 0.000005 and 0.05. I've obtained the following estimates: alpha = 0.000498 mu = 501.09 sigma = 113.84 It seems much better than the previous estimates. I'm impatiently waiting for your comments and criticals. Thank you in advance! Nicolas

• 1.1k views

ADD COMMENT • link updated 21.3 years ago by Rafael A. Irizarry ★ 2.3k • written 21.3 years ago by Antille,Nicolas,LAUSANNE,NRC-BAS ▴ 60

0

Entering edit mode

Antille,Nicolas,LAUSANNE,NRC-BAS ▴ 60

@antillenicolaslausannenrc-bas-337

Last seen 10.2 years ago

Sorry! same message as before but with the correct values!! Sorry for the inconvenience. Regards Nicolas ############################################################### ############################################################### Hi! During background subtraction (with RMA), it is assumed that the the observed intensities (O) are the sum of a signal(S) and a noise(N) : O = S + N with S follows an exponential ditribution (parameter = alpha) and N follows a normal distribution (parameters = mu and sigma). I've tried to assess the quality of these parameters estimates by simulation. For that, I've chosen the values of the three parameters and I've generated a large data set (N values): dataset <- rexp(N,alpha) + rnorm(N,mu,sigma) After that, i've tried to estimate the values of the parameters like in RMA: bg.parameters(dataset, 2^12) and I've been surprised to discover that the estimates weren't good at all For example, with alpha=0.01, mu=50, lambda=75 and N=20000, I've obtained the following estimates: alpha = 0.037 mu = 95.004 sigma = 112.77 This encouraged me to find another way to estimate the parameters. I've found it, but I need your opinion: First, I've computed the theoretical distribution of O (convolution of S and N). I've found the following expression: f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu - alpha*sigma^2)/(sigma)) where phi is the cumulative distribution function of the normal distribution. Then I've computed the corresponding log-likelihood function. The computation of maximum log-likelihood is possible but difficult. That's why I propose to solve it in a different way. We have: E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.) mean(intensities) and Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 = (appr.) var(intensities) where we suppose that S and N are independant and where intensities is the vector of intensities (perfect match). Then I suggest to make the following substitutions in the log- likelihood function: mu ----> mean(intensities) - 1/alpha sigma^2 ----> var(intensities) - 1/alpha^2 At this point, we have to maximize the log-likelihood which contains just one unknown parameter : alpha For the moment, I suggest to find the maximum empirically (compute the function for a large range of alpha) In the example, I've computed the function for 1000 values of alpha randomly selected between 0.001 and 0.1. I've obtained the following estimates: alpha = 0.01006 mu = 49.783 sigma = 75.411 It seems much better than the previous estimates. I'm impatiently waiting for your comments and criticals. Thank you in advance! Nicolas _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 21.3 years ago Antille,Nicolas,LAUSANNE,NRC-BAS ▴ 60

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 10.2 years ago

it shoudlnt surprise you. the estimation method is very very ad-hoc. the goal is to obtain a useful transformation for background adjustment not estimate parameters efficiently. you could get better estiamtes usng, for exaple, an EM algorithm, but then RMA would be much much slower and bottom line results wouldnt change much. furthermore, the normal + exponential model can be improved. see http://biosun01.biostat.jhsph.edu/~ririzarr/papers/gcpaper.pdf for a model that fits beter (not perfect) -r On Thu, 7 Aug 2003, Antille,Nicolas,LAUSANNE,NRC-BAS wrote: > Hi! > > During background subtraction (with RMA), it is assumed that > the the observed intensities (O) are the sum of a signal(S) > and a noise(N) : > > O = S + N > > with S follows an exponential ditribution (parameter = alpha) and > N follows a normal distribution (parameters = mu and sigma). > I've tried to assess the quality of these parameters estimates by > simulation. For that, I've chosen the values of the three parameters > and I've generated a large data set (N values): > > dataset <- rexp(N,alpha) + rnorm(N,mu,sigma) > > After that, i've tried to estimate the values of the parameters like > in RMA: > > bg.parameters(dataset, 2^12) > > and I've been surprised to discover that the estimates weren't good at all > For example, with alpha=0.0005, mu=500, lambda=75 and N=50000, I've obtained > the following estimates: > > alpha = 0.0029 > mu = 776.64 > sigma = 251.14 > > This encouraged me to find another way to estimate the parameters. I've > found > it, but I need your opinion: > > First, I've computed the theoretical distribution of O (convolution of S and > N). > I've found the following expression: > > f(o) = alpha * exp(alpha*(mu + 0.5*alpha*sigma^2 - o)) * phi((o - mu > - alpha*sigma^2)/(sigma)) > > where phi is the cumulative distribution function of the normal > distribution. > > Then I've computed the corresponding log-likelihood function. The > computation of maximum > log-likelihood is possible but difficult. That's why I propose to solve it > in a different way. > > We have: E[O] = E[S+N] = E[S]+E[N] = 1/alpha + mu = (appr.) > mean(intensities) > and Var[O] = Var[S+N] = Var[S] + Var[N] = 1/alpha^2 + sigma^2 = > (appr.) var(intensities) > > where we suppose that S and N are independant and where intensities is the > vector of intensities (perfect match). > > Then I suggest to make the following substitutions in the log- likelihood > function: > > mu ----> mean(intensities) - 1/alpha > sigma^2 ----> var(intensities) - 1/alpha^2 > > At this point, we have to maximize the log-likelihood which contains just > one unknown parameter : alpha > For the moment, I suggest to find the maximum empirically (compute the > function for a large range of alpha) > In the example, I've computed the function for 5000 values of alpha randomly > selected between 0.000005 and 0.05. > I've obtained the following estimates: > > alpha = 0.000498 > mu = 501.09 > sigma = 113.84 > > It seems much better than the previous estimates. > > I'm impatiently waiting for your comments and criticals. > > Thank you in advance! > > Nicolas > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.3 years ago Rafael A. Irizarry ★ 2.3k

Login before adding your answer.