problems about cDNA vs genomic arrays normalization

0

Entering edit mode

yanju@liacs.nl ▴ 160

@yanjuliacsnl-1786

Last seen 10.7 years ago

Dear all, I have got a microarray dataset derived from common reference design. The common reference is gemoic DNA. In normal normalization, we assume that large fraction of genes is not differently expressed, then the adjustment strategies are used to let the log-ratios have a median(mean) of 0. But in my case, every spot would have the same observed signal in the genomic channel while the signals in the cDNA channel vary greatly. Therefore, the strategies that i just mentioned are not suitable. I was wondering how to normalize this kinds of data? Is that any packages or functions existed already? Expecting your reply. Regards, Yanju

Microarray Normalization Microarray Normalization • 1.6k views

ADD COMMENT • link updated 18.4 years ago by Jenny Drnevich ★ 2.2k • written 18.5 years ago by yanju@liacs.nl ▴ 160

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 10.7 years ago

Hi Yanju, I have just been working with a couple of data sets similar to yours where a) one channel has the same reference and b) the assumptions of few differences between sample and reference are not necessarily upheld. In these cases I have been using the Rquantile or Gquantile methods of normalizeBetweenArrays() in limma. These methods will do a quantile normalization on the R or G channel indicated so they have the "same empirical distribution across arrays, leaving the M-values (log- ratios) unchanged." Say your reference is in the green channel - doing a Gquantile normalization would force all the reference values to have the same distribution, and then adjust the R channel values accordingly. For the statistical analysis, you use the R values directly because if you use the M values, it would be like you never did the normalization. If the reference is not all in the same channel, I manipulate the RGList so that they are all in the same channel, but then I also include 'dye' as a batch effect in the model. HTH, Jenny At 10:32 AM 11/20/2006, yanju wrote: >Dear all, > >I have got a microarray dataset derived from common reference design. >The common reference is gemoic DNA. In normal normalization, we assume >that large fraction of genes is not differently expressed, then the >adjustment strategies are used to let the log-ratios have a median(mean) >of 0. But in my case, every spot would have the same observed signal in >the genomic channel while the signals in the cDNA channel vary greatly. >Therefore, the strategies that i just mentioned are not suitable. I was >wondering how to normalize this kinds of data? Is that any packages or >functions existed already? Expecting your reply. > >Regards, >Yanju > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 18.5 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Thanks Jenny, After reading your explanation, I still have 2 puzzles. 1. Before I also applied normalizeWithinArrays() method to this dataset. Do you think it is correct or necessary in my case? 2. You said "For the statistical analysis, you use the R values directly." But after normalizeBetweenArrays(), then a MAList was generated. It consisted of M, A value etc but not R value (red channel intensity). And then I fited my MAlist to the linear model by using: design<-modelMatrix(targets, ref="gDNA") fit<-lmFit(ma.paq,design) I think all my following analysis are based on the M value. Finally, I used eBayes function to summary statistics in order to detect the most differently expressed genes. cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design) fit2<-contrasts.fit(fit,cont.matrix) fit2<-eBayes(fit2) So, I have no idea how to use R values directly. Was my codes wrong? I was not quite sure about my code or method, because at the end I gave some uninterpretable results which did not meet the expectation of the biologists. That is why now I am recheck my code and methods. Thank you again and also Wolfgang for your kindly help. Kind regards, Yanju Jenny Drnevich wrote: > Hi Yanju, > > I have just been working with a couple of data sets similar to yours > where a) one channel has the same reference and b) the assumptions of > few differences between sample and reference are not necessarily > upheld. In these cases I have been using the Rquantile or Gquantile > methods of normalizeBetweenArrays() in limma. These methods will do a > quantile normalization on the R or G channel indicated so they have > the "same empirical distribution across arrays, leaving the M-values > (log-ratios) unchanged." Say your reference is in the green channel - > doing a Gquantile normalization would force all the reference values > to have the same distribution, and then adjust the R channel values > accordingly. For the statistical analysis, you use the R values > directly because if you use the M values, it would be like you never > did the normalization. If the reference is not all in the same > channel, I manipulate the RGList so that they are all in the same > channel, but then I also include 'dye' as a batch effect in the model. > > HTH, > Jenny > > At 10:32 AM 11/20/2006, yanju wrote: > >> Dear all, >> >> I have got a microarray dataset derived from common reference design. >> The common reference is gemoic DNA. In normal normalization, we assume >> that large fraction of genes is not differently expressed, then the >> adjustment strategies are used to let the log-ratios have a median(mean) >> of 0. But in my case, every spot would have the same observed signal in >> the genomic channel while the signals in the cDNA channel vary greatly. >> Therefore, the strategies that i just mentioned are not suitable. I was >> wondering how to normalize this kinds of data? Is that any packages or >> functions existed already? Expecting your reply. >> >> Regards, >> Yanju >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.5 years ago yanju@liacs.nl ▴ 160

0

Entering edit mode

Hi Yanju, >After reading your explanation, I still have 2 puzzles. >1. Before I also applied normalizeWithinArrays() method to this >dataset. Do you think it is correct or necessary in my case? No, you should not do normalizeWithinArrays! This assumes that most genes are not changing expression between the two samples on one array, and in your case you have every reason to expect that the 'expression' levels of genomic DNA will not be anything like cDNA from your experimental groups, as you mentioned in your first post. >2. You said "For the statistical analysis, you use the R values >directly." But after normalizeBetweenArrays(), then a MAList was >generated. It consisted of M, A value etc but not R value (red channel >intensity). It's easy to convert between RGLists, which contain R and G values, and MALists, which have M and A values. See 'RG.MA' and 'MA.RG' - they're explained at the end of the details section of the help page for 'normalizeWithinArrays'. Another thing - Are you doing a background correction first? Because if you don't, and do 'normalizeWithinArrays' or 'normalizeBetweenArrays' on a RGList that still has the Rb and Gb items in it, a simple background subtraction will be done automatically. This is not necessarily a good thing IMO because a negative R or G values in either channel will cause the M & A values to be lost, so that you cannot recreate the R & G values again. Let's say for simplicity sake that RG is your original RGList before any pre-processing, and the genomic DNA is in the Green channel on each slide. I would do something like this: RG.nobg <- backgroundCorrect(RG, method="none") # or maybe pick "half" to avoid neg. values MA.nobg.Gquant <- normalizeBetweenArrays(RG.nobg,method="Gquantile") # do a quantile normalization on the G / genomic values RG.nobg.Gquant <- RG.MA(MA.nobg.Gquant) # convert the MAList back to a RGList MA.fake <- MA.nobg.Gquant # create a MAList to manipulate MA.fake$M <- log2(RG.nobg.Gquant$R) # replace the M values with the log2(R) values so you can do the analysis on them You can now proceed with the analysis as if you had Affymetrix-type data. You'll have to change your design matrix accordingly (no -1s!), but the rest of your analysis should be the same as you have below. It gets a bit more complicated if the genomic DNA is not all in the G channel - after the background correction you have to switch the R & G values for the arrays that have genomic DNA in the R channel, then account for the dye effect by fitting a block effect using 'duplicateCorrelation'. It's very similar to the Technical Replication/Randomized Block section of the limma vignette. Good luck, Jenny >And then I fited my MAlist to the linear model by using: > design<-modelMatrix(targets, ref="gDNA") > fit<-lmFit(ma.paq,design) >I think all my following analysis are based on the M value. Finally, I >used eBayes function to summary statistics in order to detect the most >differently expressed genes. > cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design) > fit2<-contrasts.fit(fit,cont.matrix) > fit2<-eBayes(fit2) >So, I have no idea how to use R values directly. Was my codes wrong? >I was not quite sure about my code or method, because at the end I gave >some uninterpretable results which did not meet the expectation of the >biologists. That is why now I am recheck my code and methods. Thank you >again and also Wolfgang for your kindly help. > >Kind regards, >Yanju > > > >Jenny Drnevich wrote: > >>Hi Yanju, >> >>I have just been working with a couple of data sets similar to yours >>where a) one channel has the same reference and b) the assumptions of few >>differences between sample and reference are not necessarily upheld. In >>these cases I have been using the Rquantile or Gquantile methods of >>normalizeBetweenArrays() in limma. These methods will do a quantile >>normalization on the R or G channel indicated so they have the "same >>empirical distribution across arrays, leaving the M-values (log- ratios) >>unchanged." Say your reference is in the green channel - doing a >>Gquantile normalization would force all the reference values to have the >>same distribution, and then adjust the R channel values accordingly. For >>the statistical analysis, you use the R values directly because if you >>use the M values, it would be like you never did the normalization. If >>the reference is not all in the same channel, I manipulate the RGList so >>that they are all in the same channel, but then I also include 'dye' as a >>batch effect in the model. >> >>HTH, >>Jenny >> >>At 10:32 AM 11/20/2006, yanju wrote: >> >>>Dear all, >>> >>>I have got a microarray dataset derived from common reference design. >>>The common reference is gemoic DNA. In normal normalization, we assume >>>that large fraction of genes is not differently expressed, then the >>>adjustment strategies are used to let the log-ratios have a median(mean) >>>of 0. But in my case, every spot would have the same observed signal in >>>the genomic channel while the signals in the cDNA channel vary greatly. >>>Therefore, the strategies that i just mentioned are not suitable. I was >>>wondering how to normalize this kinds of data? Is that any packages or >>>functions existed already? Expecting your reply. >>> >>>Regards, >>>Yanju >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at uiuc.edu > Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.5 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Dear Jenny, Generally, I got your point. But still one not clear. you mentioned "you'll change your design matrix accordingly (no -1s!)". May I know the reason why? 'Cos I generated the design matrix like this: #design<-modelMatrix(targets, ref="gDNA") > design wt16 wt20 wt24 sample1 -1 0 0 sample2 -1 0 0 sample3 -1 0 0 my dataset is generated by dual-channel array without dye swap. How should I change my design matrix? Regards, Yanju Jenny Drnevich wrote: > Hi Yanju, > > >> After reading your explanation, I still have 2 puzzles. >> 1. Before I also applied normalizeWithinArrays() method to this >> dataset. Do you think it is correct or necessary in my case? > > > No, you should not do normalizeWithinArrays! This assumes that most > genes are not changing expression between the two samples on one > array, and in your case you have every reason to expect that the > 'expression' levels of genomic DNA will not be anything like cDNA from > your experimental groups, as you mentioned in your first post. > > >> 2. You said "For the statistical analysis, you use the R values >> directly." But after normalizeBetweenArrays(), then a MAList was >> generated. It consisted of M, A value etc but not R value (red >> channel intensity). > > > It's easy to convert between RGLists, which contain R and G values, > and MALists, which have M and A values. See 'RG.MA' and 'MA.RG' - > they're explained at the end of the details section of the help page > for 'normalizeWithinArrays'. Another thing - Are you doing a > background correction first? Because if you don't, and do > 'normalizeWithinArrays' or 'normalizeBetweenArrays' on a RGList that > still has the Rb and Gb items in it, a simple background subtraction > will be done automatically. This is not necessarily a good thing IMO > because a negative R or G values in either channel will cause the M & > A values to be lost, so that you cannot recreate the R & G values > again. Let's say for simplicity sake that RG is your original RGList > before any pre-processing, and the genomic DNA is in the Green channel > on each slide. I would do something like this: > > RG.nobg <- backgroundCorrect(RG, method="none") > # or maybe pick "half" to avoid neg. values > > MA.nobg.Gquant <- normalizeBetweenArrays(RG.nobg,method="Gquantile") > # do a quantile normalization on the G / genomic values > > RG.nobg.Gquant <- RG.MA(MA.nobg.Gquant) > # convert the MAList back to a RGList > > MA.fake <- MA.nobg.Gquant > # create a MAList to manipulate > > MA.fake$M <- log2(RG.nobg.Gquant$R) > # replace the M values with the log2(R) values so you can do > the analysis on them > > You can now proceed with the analysis as if you had Affymetrix-type > data. You'll have to change your design matrix accordingly (no -1s!), > but the rest of your analysis should be the same as you have below. It > gets a bit more complicated if the genomic DNA is not all in the G > channel - after the background correction you have to switch the R & G > values for the arrays that have genomic DNA in the R channel, then > account for the dye effect by fitting a block effect using > 'duplicateCorrelation'. It's very similar to the Technical > Replication/Randomized Block section of the limma vignette. > > Good luck, > Jenny > > > >> And then I fited my MAlist to the linear model by using: >> design<-modelMatrix(targets, ref="gDNA") >> fit<-lmFit(ma.paq,design) >> I think all my following analysis are based on the M value. Finally, >> I used eBayes function to summary statistics in order to detect the >> most differently expressed genes. >> cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design) >> fit2<-contrasts.fit(fit,cont.matrix) >> fit2<-eBayes(fit2) >> So, I have no idea how to use R values directly. Was my codes wrong? >> I was not quite sure about my code or method, because at the end I >> gave some uninterpretable results which did not meet the expectation >> of the biologists. That is why now I am recheck my code and methods. >> Thank you again and also Wolfgang for your kindly help. >> >> Kind regards, >> Yanju >> >> >> >> Jenny Drnevich wrote: >> >>> Hi Yanju, >>> >>> I have just been working with a couple of data sets similar to yours >>> where a) one channel has the same reference and b) the assumptions >>> of few differences between sample and reference are not necessarily >>> upheld. In these cases I have been using the Rquantile or Gquantile >>> methods of normalizeBetweenArrays() in limma. These methods will do >>> a quantile normalization on the R or G channel indicated so they >>> have the "same empirical distribution across arrays, leaving the >>> M-values (log-ratios) unchanged." Say your reference is in the green >>> channel - doing a Gquantile normalization would force all the >>> reference values to have the same distribution, and then adjust the >>> R channel values accordingly. For the statistical analysis, you use >>> the R values directly because if you use the M values, it would be >>> like you never did the normalization. If the reference is not all in >>> the same channel, I manipulate the RGList so that they are all in >>> the same channel, but then I also include 'dye' as a batch effect in >>> the model. >>> >>> HTH, >>> Jenny >>> >>> At 10:32 AM 11/20/2006, yanju wrote: >>> >>>> Dear all, >>>> >>>> I have got a microarray dataset derived from common reference design. >>>> The common reference is gemoic DNA. In normal normalization, we >>>> assume >>>> that large fraction of genes is not differently expressed, then the >>>> adjustment strategies are used to let the log-ratios have a >>>> median(mean) >>>> of 0. But in my case, every spot would have the same observed >>>> signal in >>>> the genomic channel while the signals in the cDNA channel vary >>>> greatly. >>>> Therefore, the strategies that i just mentioned are not suitable. I >>>> was >>>> wondering how to normalize this kinds of data? Is that any packages or >>>> functions existed already? Expecting your reply. >>>> >>>> Regards, >>>> Yanju >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >>> Jenny Drnevich, Ph.D. >>> >>> Functional Genomics Bioinformatics Specialist >>> W.M. Keck Center for Comparative and Functional Genomics >>> Roy J. Carver Biotechnology Center >>> University of Illinois, Urbana-Champaign >>> >>> 330 ERML >>> 1201 W. Gregory Dr. >>> Urbana, IL 61801 >>> USA >>> >>> ph: 217-244-7355 >>> fax: 217-265-5066 >>> e-mail: drnevich at uiuc.edu >> >> > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.4 years ago yanju@liacs.nl ▴ 160

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 9 weeks ago

EMBL European Molecular Biology Laborat…

Dear Yanju, you can calculate the log-ratios for each array, and then normalize the log-ratios between arrays by assuming that the majority of these have negligible changes (e.g. quantile normalization or just scaling). That should still be a viable assumption in your case. Afais there is no simple answer on how this compares with Jenny's proposition (i.e. how to judge what is "best"), but you can do an evaluation of the bottom-line result using alternative preprocessing strategies and see where you get best results. You might also want to try to get your biologists talk to you already before doing the experiments, on the experimental design. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber > Dear all, > > I have got a microarray dataset derived from common reference design. > The common reference is gemoic DNA. In normal normalization, we assume > that large fraction of genes is not differently expressed, then the > adjustment strategies are used to let the log-ratios have a median(mean) > of 0. But in my case, every spot would have the same observed signal in > the genomic channel while the signals in the cDNA channel vary greatly. > Therefore, the strategies that i just mentioned are not suitable. I was > wondering how to normalize this kinds of data? Is that any packages or > functions existed already? Expecting your reply. > > Regards, > Yanju >

ADD COMMENT • link 18.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Jenny Drnevich ★ 2.2k

@jenny-drnevich-382

Last seen 10.7 years ago

Hi Yanju, Two suggestions - 1) The code I gave you before was written as if your reference was in the green (Cy3) channel. However, based on the results of your 'modelMatrix(targets, ref="gDNA")' command, your reference is in the red (Cy5) channel. Therefore, you would have to reverse some of the commands where appropriate (e.g., use method "Rquantile" and replace M values with G values). 2) Find a local statistician to consult about the analysis, because it appears you have a 2 x 7 factorial design (2 strains and 7 timepoints = 14 treatment groups total). There are variety of ways to analyze this experimental design, depending on what all you want to know from the data. If you really only wanted to know which genes were different between mu & wt at each time point independently, then you could analyze the arrays from each time point separately. However, there is so much more information to be gained from this data set, which is why I suggest you consult a local statistician. Best of luck, Jenny At 10:54 AM 11/21/2006, yanju wrote: >Hello Jenny, > >I adapted my code according to your suggestion. Then at some time points, >the results showed that the most differently expressed genes are markers. >This is every werld. > >And It doesnt matter if I change -1 in the design matrix to 1 (my method: >new design matrix=old design matrix* -1, old design matrix was derived >from modelMatrix function) or not. I mean this didnt effect my results. > >Since I could not figure it out, I paste my code here. Hope you could tell >me what's wrong with my program. Basic information of the data: >Two-channle array, 7 time points from 16-72h, at each time point there are >some repelicants. Aim is to detect the different expressed genes at each >time points. > > From the very begin of the code: > >targets<-readTargets("target_new_16_72.txt") >rg<-read.maimages(targets, source="genepix",wt.fun=wtflags(0.1)) > # read targets and genepix files > >rgc<-backgroundCorrect(rg, method="half") > # bacground correction > >MA.Gquant<-normalizeBetweenArrays(rgc, method="Gquantile") >RG.Gquant<-RG.MA(MA.Gquant) >MA.fake<-MA.Gquant >MA.fake$M<-log2(RG.Gquant$R) > #normalization > >design<-modelMatrix(targets, ref="gDNA") >design_revise<-design*-1 > #design was similar like follows. > # wt16 wt20 wt24 > #[1,] -1 0 0 > #[2,] 0 -1 0 > #[3,] 0 0 -1 > #Then it was multiply by -1 to have the positive value. > > > fit<-lmFit(MA.fake,design_revise) > cont.matrix<-makeContrasts(MUvsWT16=mu16-wt16, > MUvsWT20=mu20-wt20, > MUvsWT24=mu24-wt24, MUvsWT36=mu36-wt36, > MUvsWT48=mu48-wt48, > MUvsWT60=mu60-wt60, MUvsWT72=mu72-wt72,levels=design_revise) > fit2<-contrasts.fit(fit,cont.matrix) > fit2<-eBayes(fit2) > #fit the data to linear model and Bayes statistical summary > >result20<-topTable(fit2,coef=2, number=20,adjust="BH") > #detecting the top20 differently expressed genes at time point20. > #But as I said, most of the top20 were markers or the "no spot" > >Hope you could help me figure out the problems. I really appreciate your >help. Thanks. > >Regards, >Yanju > > Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD COMMENT • link 18.4 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Hello Jenny, I also realized the first problem you mentioned after I sent the email to you. After I change it, the results seems better, not very much markers appeared in the list. Thank you very much for your suggestion during these two days. I think for the time being, my questions have been settled. Have a nice day, Yanju Jenny Drnevich wrote: > Hi Yanju, > > Two suggestions - 1) The code I gave you before was written as if your > reference was in the green (Cy3) channel. However, based on the > results of your 'modelMatrix(targets, ref="gDNA")' command, your > reference is in the red (Cy5) channel. Therefore, you would have to > reverse some of the commands where appropriate (e.g., use method > "Rquantile" and replace M values with G values). > > 2) Find a local statistician to consult about the analysis, because it > appears you have a 2 x 7 factorial design (2 strains and 7 timepoints > = 14 treatment groups total). There are variety of ways to analyze > this experimental design, depending on what all you want to know from > the data. If you really only wanted to know which genes were different > between mu & wt at each time point independently, then you could > analyze the arrays from each time point separately. However, there is > so much more information to be gained from this data set, which is why > I suggest you consult a local statistician. > > Best of luck, > Jenny > > At 10:54 AM 11/21/2006, yanju wrote: > >> Hello Jenny, >> >> I adapted my code according to your suggestion. Then at some time >> points, the results showed that the most differently expressed genes >> are markers. This is every werld. >> >> And It doesnt matter if I change -1 in the design matrix to 1 (my >> method: new design matrix=old design matrix* -1, old design matrix >> was derived from modelMatrix function) or not. I mean this didnt >> effect my results. >> >> Since I could not figure it out, I paste my code here. Hope you could >> tell me what's wrong with my program. Basic information of the data: >> Two-channle array, 7 time points from 16-72h, at each time point >> there are some repelicants. Aim is to detect the different expressed >> genes at each time points. >> >> From the very begin of the code: >> >> targets<-readTargets("target_new_16_72.txt") >> rg<-read.maimages(targets, source="genepix",wt.fun=wtflags(0.1)) >> # read targets and genepix files >> >> rgc<-backgroundCorrect(rg, method="half") >> # bacground correction >> >> MA.Gquant<-normalizeBetweenArrays(rgc, method="Gquantile") >> RG.Gquant<-RG.MA(MA.Gquant) >> MA.fake<-MA.Gquant >> MA.fake$M<-log2(RG.Gquant$R) >> #normalization >> >> design<-modelMatrix(targets, ref="gDNA") >> design_revise<-design*-1 >> #design was similar like follows. >> # wt16 wt20 wt24 >> #[1,] -1 0 0 >> #[2,] 0 -1 0 >> #[3,] 0 0 -1 >> #Then it was multiply by -1 to have the positive value. >> >> >> fit<-lmFit(MA.fake,design_revise) >> cont.matrix<-makeContrasts(MUvsWT16=mu16-wt16, MUvsWT20=mu20-wt20, >> MUvsWT24=mu24-wt24, MUvsWT36=mu36-wt36, MUvsWT48=mu48-wt48, >> MUvsWT60=mu60-wt60, MUvsWT72=mu72-wt72,levels=design_revise) >> fit2<-contrasts.fit(fit,cont.matrix) >> fit2<-eBayes(fit2) >> #fit the data to linear model and Bayes statistical summary >> >> result20<-topTable(fit2,coef=2, number=20,adjust="BH") >> #detecting the top20 differently expressed genes at time point20. >> #But as I said, most of the top20 were markers or the "no spot" >> >> Hope you could help me figure out the problems. I really appreciate >> your help. Thanks. >> >> Regards, >> Yanju >> >> > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu

ADD REPLY • link 18.4 years ago yanju@liacs.nl ▴ 160

Login before adding your answer.