Fwd: statistical test for time course data
2
0
Entering edit mode
chris Jhon ▴ 260
@chris-jhon-5047
Last seen 10.2 years ago
---------- Forwarded message ---------- From: chris Jhon <cjhon217@gmail.com> Date: Wed, Feb 13, 2013 at 9:23 PM Subject: Re: [BioC] statistical test for time course data To: Richard Friedman <friedman@cancercenter.columbia.edu> Hi ; Thank you Richard for help. I have the data like this table Time number 0hr # 6hr # 24hr # i tried to follow the example as in userguide and as Richard suggested me,but have the following questions: in user guide *************** > lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr") > f <- factor(targets$Target, levels=lev) > design <- model.matrix(~0+f) > colnames(design) <- lev > fit <- lmFit(eset, design) *************** Q1) what about est, in this stage i would like to test the statistical significance between numbers showed in second column which represents the number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number?? when i tried so i got the following error --- Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be numeric Q2) Can anyone explain for methe meaning of (~0+f) in design <- model.matrix(~0+f) Q3) how to design different matrices for different conditions,can any one send me a tutorial for this. Thank you very much in advance. On Wed, Feb 6, 2013 at 10:30 PM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > Dear Chris, > > For the questions you are asking I recommend not using splines. > For the comparison of t1 vs t2 us a design matrix which makes every point > a different > time point and then do t2 vs t1, For t1 compared to all other points, I > would > label t1 A, and all other points B. > > If anyone on the list has a different opinion in the matter I would > appreciate hearing from them. > > With hopes that this helps, > Rich > > > > On Feb 5, 2013, at 10:09 AM, chris Jhon wrote: > > Hi All, > > Thank you Gordon and Richard very much. > > In my data,for each time point i have the number of expressed genes and i > would like to find if the number of expressed genes at t1 is different from > number of expressed genes at t2 ,or is different from all other time point > using statistical test. > > the data look like this: > > time t1 t2 .... tn > expressed genes # # ......# > > I have only one group,Shall i use same design matrix ? shall i use df=5 > as in example?? > > > Best Regards, > Chris > > On Tue, Feb 5, 2013 at 12:02 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Dear Rich, >> >> I have added a time course example using splines to the limma User's >> Guide, see page 48: >> >> http://bioconductor.org/**packages/2.12/bioc/vignettes/** >> limma/inst/doc/usersguide.pdf<http: bioconductor.org="" packages="" 2.12="" bioc="" vignettes="" limma="" inst="" doc="" usersguide.pdf=""> >> >> Best wishes >> Gordon >> >> ------------------ original message ------------------ >> [BioC] statistical test for time course data >> Richard Friedman friedman at cancercenter.columbia.edu >> Sun Feb 3 20:18:03 CET 2013 >> >> Dear Gordon, >> >> Thank you very much for the clarification. Now that I think of >> it, the one-against all is straightforward. However, If there are any >> worked examples you could point me towards for polynomial and spline >> modeling of the time series I would greatly appreciate it. I am especially >> interested in testing the hypothesis that the temporal behavior of 2 >> treatments are different. >> >> Best wishes, >> Rich >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@richard-friedman-513
Last seen 10.2 years ago
On Feb 13, 2013, at 7:24 AM, chris Jhon wrote: > ---------- Forwarded message ---------- > From: chris Jhon <cjhon217 at="" gmail.com=""> > Date: Wed, Feb 13, 2013 at 9:23 PM > Subject: Re: [BioC] statistical test for time course data > To: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > > > Hi ; > > Thank you Richard for help. > I have the data like this table > > Time number > 0hr # > 6hr # > 24hr # > > i tried to follow the example as in userguide and as Richard suggested > me,but have the following questions: > in user guide > *************** >> lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr") >> f <- factor(targets$Target, levels=lev) >> design <- model.matrix(~0+f) >> colnames(design) <- lev >> fit <- lmFit(eset, design) > *************** > Chris, > Q1) what about est, in this stage i would like to test the statistical > significance between numbers showed in second column which represents the > number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number?? > > when i tried so i got the following error --- Error in rowMeans(y$exprs, > na.rm = TRUE) : 'x' must be numeric It does not express the number of genes. eset contains the expression data, > > Q2) Can anyone explain for methe meaning of (~0+f) in > design <- model.matrix(~0+f) I myself am vague on this point, but typing design will give you your design matrix. > > Q3) how to design different matrices for different conditions,can any one > send me a tutorial for this. > In the targets file label all of the time points except the one to be left out as a. Label the others b. You can send me your targets file when you do this (send to the list as well). With hopes that this helps, Roc > Thank you very much in advance. > > > > > On Wed, Feb 6, 2013 at 10:30 PM, Richard Friedman < > friedman at cancercenter.columbia.edu> wrote: > >> Dear Chris, >> >> For the questions you are asking I recommend not using splines. >> For the comparison of t1 vs t2 us a design matrix which makes every point >> a different >> time point and then do t2 vs t1, For t1 compared to all other points, I >> would >> label t1 A, and all other points B. >> >> If anyone on the list has a different opinion in the matter I would >> appreciate hearing from them. >> >> With hopes that this helps, >> Rich >> >> >> >> On Feb 5, 2013, at 10:09 AM, chris Jhon wrote: >> >> Hi All, >> >> Thank you Gordon and Richard very much. >> >> In my data,for each time point i have the number of expressed genes and i >> would like to find if the number of expressed genes at t1 is different from >> number of expressed genes at t2 ,or is different from all other time point >> using statistical test. >> >> the data look like this: >> >> time t1 t2 .... tn >> expressed genes # # ......# >> >> I have only one group,Shall i use same design matrix ? shall i use df=5 >> as in example?? >> >> >> Best Regards, >> Chris >> >> On Tue, Feb 5, 2013 at 12:02 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Rich, >>> >>> I have added a time course example using splines to the limma User's >>> Guide, see page 48: >>> >>> http://bioconductor.org/**packages/2.12/bioc/vignettes/** >>> limma/inst/doc/usersguide.pdf<http: bioconductor.org="" packages="" 2.1="" 2="" bioc="" vignettes="" limma="" inst="" doc="" usersguide.pdf=""> >>> >>> Best wishes >>> Gordon >>> >>> ------------------ original message ------------------ >>> [BioC] statistical test for time course data >>> Richard Friedman friedman at cancercenter.columbia.edu >>> Sun Feb 3 20:18:03 CET 2013 >>> >>> Dear Gordon, >>> >>> Thank you very much for the clarification. Now that I think of >>> it, the one-against all is straightforward. However, If there are any >>> worked examples you could point me towards for polynomial and spline >>> modeling of the time series I would greatly appreciate it. I am especially >>> interested in testing the hypothesis that the temporal behavior of 2 >>> treatments are different. >>> >>> Best wishes, >>> Rich >>> >>> ______________________________**______________________________** >>> __________ >>> The information in this email is confidential and intend...{{dropped:4}} >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Q2) Can anyone explain for methe meaning of (~0+f) in > design <- model.matrix(~0+f) "0~f" means that the intercept for your model fit is forced to go through the origin (i.e. 0). Normally when fitting a linear model, the algorithm (be it the normal equation, gradient descent or whatever) will calculate the line of best fit by adjusting both the slope and the intercept of the model. When you force the model through the origin, you estimate the model fit by just adjusting the slope. This is a little difficult to get your head around if you don't have a very solid understanding of statistical modelling. What would be really cool is if the Limma documentation listed some resources that would allow somebody who is coming at Limma from a biology or computer science background to wrap their head around the fundamentals of statistical modelling and really understand what they are doing with the package. Just a suggestion, you guys have of course already done an amazing job with limma and the documentation!! P.S. One very useful resource is Andrew Ng's course on "machine learning" on "coursera", it explains linear models very well indeed. Paul. On Mon, Feb 18, 2013 at 11:53 AM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> wrote: > > On Feb 13, 2013, at 7:24 AM, chris Jhon wrote: > >> ---------- Forwarded message ---------- >> From: chris Jhon <cjhon217 at="" gmail.com=""> >> Date: Wed, Feb 13, 2013 at 9:23 PM >> Subject: Re: [BioC] statistical test for time course data >> To: Richard Friedman <friedman at="" cancercenter.columbia.edu=""> >> >> >> Hi ; >> >> Thank you Richard for help. >> I have the data like this table >> >> Time number >> 0hr # >> 6hr # >> 24hr # >> >> i tried to follow the example as in userguide and as Richard suggested >> me,but have the following questions: >> in user guide >> *************** >>> lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr") >>> f <- factor(targets$Target, levels=lev) >>> design <- model.matrix(~0+f) >>> colnames(design) <- lev >>> fit <- lmFit(eset, design) >> *************** >> > > Chris, > >> Q1) what about est, in this stage i would like to test the statistical >> significance between numbers showed in second column which represents the >> number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number?? >> >> when i tried so i got the following error --- Error in rowMeans(y$exprs, >> na.rm = TRUE) : 'x' must be numeric > > It does not express the number of genes. eset contains the expression data, > > >> >> Q2) Can anyone explain for methe meaning of (~0+f) in >> design <- model.matrix(~0+f) > > I myself am vague on this point, but typing > > design > > will give you your design matrix. > > >> >> Q3) how to design different matrices for different conditions,can any one >> send me a tutorial for this. >> > > > In the targets file label all of the time points except the one to be left out as a. > Label the others b. > > You can send me your targets file when you do this (send to the list as well). > > With hopes that this helps, > Roc > >> Thank you very much in advance. >> >> >> >> >> On Wed, Feb 6, 2013 at 10:30 PM, Richard Friedman < >> friedman at cancercenter.columbia.edu> wrote: >> >>> Dear Chris, >>> >>> For the questions you are asking I recommend not using splines. >>> For the comparison of t1 vs t2 us a design matrix which makes every point >>> a different >>> time point and then do t2 vs t1, For t1 compared to all other points, I >>> would >>> label t1 A, and all other points B. >>> >>> If anyone on the list has a different opinion in the matter I would >>> appreciate hearing from them. >>> >>> With hopes that this helps, >>> Rich >>> >>> >>> >>> On Feb 5, 2013, at 10:09 AM, chris Jhon wrote: >>> >>> Hi All, >>> >>> Thank you Gordon and Richard very much. >>> >>> In my data,for each time point i have the number of expressed genes and i >>> would like to find if the number of expressed genes at t1 is different from >>> number of expressed genes at t2 ,or is different from all other time point >>> using statistical test. >>> >>> the data look like this: >>> >>> time t1 t2 .... tn >>> expressed genes # # ......# >>> >>> I have only one group,Shall i use same design matrix ? shall i use df=5 >>> as in example?? >>> >>> >>> Best Regards, >>> Chris >>> >>> On Tue, Feb 5, 2013 at 12:02 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Rich, >>>> >>>> I have added a time course example using splines to the limma User's >>>> Guide, see page 48: >>>> >>>> http://bioconductor.org/**packages/2.12/bioc/vignettes/** >>>> limma/inst/doc/usersguide.pdf<http: bioconductor.org="" packages="" 2.="" 12="" bioc="" vignettes="" limma="" inst="" doc="" usersguide.pdf=""> >>>> >>>> Best wishes >>>> Gordon >>>> >>>> ------------------ original message ------------------ >>>> [BioC] statistical test for time course data >>>> Richard Friedman friedman at cancercenter.columbia.edu >>>> Sun Feb 3 20:18:03 CET 2013 >>>> >>>> Dear Gordon, >>>> >>>> Thank you very much for the clarification. Now that I think of >>>> it, the one-against all is straightforward. However, If there are any >>>> worked examples you could point me towards for polynomial and spline >>>> modeling of the time series I would greatly appreciate it. I am especially >>>> interested in testing the hypothesis that the temporal behavior of 2 >>>> treatments are different. >>>> >>>> Best wishes, >>>> Rich >>>> >>>> ______________________________**______________________________** >>>> __________ >>>> The information in this email is confidential and intend...{{dropped:4}} >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor<http: news.gmane.org="" gma="" ne.science.biology.informatics.conductor=""> >>>> >>> >>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Dr. Paul Geeleher, PhD (Bioinformatics) Section of Hematology-Oncology Department of Medicine The University of Chicago 900 E. 57th St., KCBD, Room 7144 Chicago, IL 60637 -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
chris Jhon ▴ 260
@chris-jhon-5047
Last seen 10.2 years ago
Hi All, I appreciate any help. Thank you. Chris ---------- Forwarded message ---------- From: chris Jhon <cjhon217@gmail.com> Date: Wed, Feb 13, 2013 at 9:24 PM Subject: Fwd: [BioC] statistical test for time course data To: Bioconductor mailing list <bioconductor@r-project.org> ---------- Forwarded message ---------- From: chris Jhon <cjhon217@gmail.com> Date: Wed, Feb 13, 2013 at 9:23 PM Subject: Re: [BioC] statistical test for time course data To: Richard Friedman <friedman@cancercenter.columbia.edu> Hi ; Thank you Richard for help. I have the data like this table Time number 0hr # 6hr # 24hr # i tried to follow the example as in userguide and as Richard suggested me,but have the following questions: in user guide *************** > lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr") > f <- factor(targets$Target, levels=lev) > design <- model.matrix(~0+f) > colnames(design) <- lev > fit <- lmFit(eset, design) *************** Q1) what about est, in this stage i would like to test the statistical significance between numbers showed in second column which represents the number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number?? when i tried so i got the following error --- Error in rowMeans(y$exprs, na.rm = TRUE) : 'x' must be numeric Q2) Can anyone explain for methe meaning of (~0+f) in design <- model.matrix(~0+f) Q3) how to design different matrices for different conditions,can any one send me a tutorial for this. Thank you very much in advance. On Wed, Feb 6, 2013 at 10:30 PM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > Dear Chris, > > For the questions you are asking I recommend not using splines. > For the comparison of t1 vs t2 us a design matrix which makes every point > a different > time point and then do t2 vs t1, For t1 compared to all other points, I > would > label t1 A, and all other points B. > > If anyone on the list has a different opinion in the matter I would > appreciate hearing from them. > > With hopes that this helps, > Rich > > > > On Feb 5, 2013, at 10:09 AM, chris Jhon wrote: > > Hi All, > > Thank you Gordon and Richard very much. > > In my data,for each time point i have the number of expressed genes and i > would like to find if the number of expressed genes at t1 is different from > number of expressed genes at t2 ,or is different from all other time point > using statistical test. > > the data look like this: > > time t1 t2 .... tn > expressed genes # # ......# > > I have only one group,Shall i use same design matrix ? shall i use df=5 > as in example?? > > > Best Regards, > Chris > > On Tue, Feb 5, 2013 at 12:02 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Dear Rich, >> >> I have added a time course example using splines to the limma User's >> Guide, see page 48: >> >> http://bioconductor.org/**packages/2.12/bioc/vignettes/** >> limma/inst/doc/usersguide.pdf<http: bioconductor.org="" packages="" 2.12="" bioc="" vignettes="" limma="" inst="" doc="" usersguide.pdf=""> >> >> Best wishes >> Gordon >> >> ------------------ original message ------------------ >> [BioC] statistical test for time course data >> Richard Friedman friedman at cancercenter.columbia.edu >> Sun Feb 3 20:18:03 CET 2013 >> >> Dear Gordon, >> >> Thank you very much for the clarification. Now that I think of >> it, the one-against all is straightforward. However, If there are any >> worked examples you could point me towards for polynomial and spline >> modeling of the time series I would greatly appreciate it. I am especially >> interested in testing the hypothesis that the temporal behavior of 2 >> treatments are different. >> >> Best wishes, >> Rich >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
On 18.02.2013 07:17, chris Jhon wrote: > Hi All, > > I appreciate any help. > > > Hi ; > > Thank you Richard for help. > I have the data like this table > > Time number > 0hr # > 6hr # > 24hr # > > i tried to follow the example as in userguide and as Richard > suggested > me,but have the following questions: > in user guide > *************** >> lev <- c("wt.0hr","wt.6hr","wt.24hr","mu.0hr","mu.6hr","mu.24hr") >> f <- factor(targets$Target, levels=lev) >> design <- model.matrix(~0+f) >> colnames(design) <- lev >> fit <- lmFit(eset, design) > *************** > > Q1) what about est, in this stage i would like to test the > statistical > significance between numbers showed in second column which represents > the > number of expressed genes,SHALL I REPLACE ESET WITH MYDATA$number?? > > when i tried so i got the following error --- Error in > rowMeans(y$exprs, > na.rm = TRUE) : 'x' must be numeric > > Q2) Can anyone explain for methe meaning of (~0+f) in > design <- model.matrix(~0+f) > > Q3) how to design different matrices for different conditions,can any > one > send me a tutorial for this. > > Thank you very much in advance. For Q2&3 I don't have any better suggestion that re-reading the Limma users guide or some general introductory texts for statistical modelling with R. For Q1, if you really want to test whether the *number of expressed genes* is different between samples (time points) (i.e. not differential expression) and you have no replicates (?) then I don't see what you can do apart from a binomial proportions test. i.e. if the total number of genes in your studied system is 30,000 and the number of genes 'expressed' at each of your three time points was 3100, 3000 and 4000 and you could try: prop.test(c(3000,4000),c(30000,30000)) Which would show you that, yes indeed, 4000/30000 is a significantly higher proportion than 3000/30000, but I'm really not sure if that is what you actually want to do! It's not a common use case and from the rest of your question I suspect there is some confusion with terminology going on (no offense!). Personally, I would say this is one of those times where you would be best served by sitting down with a friendly local expert. -- Alex Gutteridge
ADD REPLY

Login before adding your answer.

Traffic: 769 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6