Statistics question for multi-factor interaction test in edgeR

Entering edit mode

Hilary Smith ▴ 70

@hilary-smith-5927

Last seen 10.2 years ago

Hi. I need to generate two GLM tests of a factorial design with RNA- Seq count data. I have 3 factors with 2 levels apiece (2 species X 2 treatments X 2 times), and 4 separate replicates each (i.e., we made a total of 2*2*2*4 = 32 separate libraries). Our main interest is in the interaction of species*treatment, as we think species A will alter gene expression in the treatment stress vs. treatment benign, whereas species B is expected to show little change. However, wed like to also do another test of species*treatment*time, because it is possible that the ability of species A to alter gene expression in response to the stress treatment may differ at the 1st versus 2nd time point. I think the way to set this up, is to create a design matrix as follows, with the lrt test with coef 5 giving the differentially expressed genes for the species*treatment test, and coef 8 giving the the differentially expressed gene for the species*treatment*time test (after calling topTags that is). Yet to ensure I have the statistics correct, my questions are: (1) is this thinking correct, as I dont see many 3x2 factorial models to follow, and (2) do I need to set up a reference somehow (which I assume would be the set of four samples with TreatmentBenign*SpeciesB*Time2, but Im not fully sure if that is correct or needed). Many thanks in advance for your insight! ~Hilary > designFF <- model.matrix(~Treatment*Species*Age) > colnames(designFF) [1] "(Intercept)" [2] " TreatmentStress" [3] "SpeciesA " [4] "Time1" [5] "TreatmentStress:SpeciesA" [6] "TreatmentStress:Time1" [7] "SpeciesA:Time1" [8] "TreatmentStress:SpeciesA:Time1" And then to run tests with: > fit <- glmFit(y, designFF) > lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) > lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) [[alternative HTML version deleted]]

• 2.2k views

ADD COMMENT • link updated 11.5 years ago by Gordon Smyth 51k • written 11.5 years ago by Hilary Smith ▴ 70

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Dear Hilary, Your test for the 3-way interaction is correct, although 3-way interactions are pretty hard to interpret. However testing for the 2-way interaction in the presence of a 3-way interaction does not make statistical sense. This is because the parametrization of the 2-way interaction as a subset of the 3-way is somewhat arbitrary. Before you can test the 2-way interaction species*treatment in a meaningful way you would need to accept that the 3-way interaction is not necessary and remove it from the model. In general, I am of the opinion that classical statistical factorial interation models do not usually provide the most meaningful parametrizations for genomic experiments. In most cases, I prefer to fit the saturated model (a different level for each treatment combination) and make specific contrasts. There is some discussion of this in the limma User's Guide. In your case, I guess that you might want to test for species*treatment interaction separately at each time point. It is almost impossible to do this within the classical 3-way factorial setup. However it is easy with the one-way approach I just mentioned, or else you could use: ~Age + Age:Species + Age:Treatment + Age:Species:Treatment Best wishes Gordon > Date: Thu, 9 May 2013 14:55:46 -0400 > From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] Statistics question for multi-factor interaction test > in edgeR > > Hi. I need to generate two GLM tests of a factorial design with RNA- Seq > count data. I have 3 factors with 2 levels apiece (2 species X 2 > treatments X 2 times), and 4 separate replicates each (i.e., we made a > total of 2*2*2*4 = 32 separate libraries). Our main interest is in the > interaction of species*treatment, as we think species A will alter gene > expression in the treatment stress vs. treatment benign, whereas species > B is expected to show little change. However, we?d like to also do > another test of species*treatment*time, because it is possible that the > ability of species A to alter gene expression in response to the stress > treatment may differ at the 1st versus 2nd time point. > > I think the way to set this up, is to create a design matrix as follows, > with the lrt test with coef 5 giving the differentially expressed genes > for the species*treatment test, and coef 8 giving the the differentially > expressed gene for the species*treatment*time test (after calling > topTags that is). Yet to ensure I have the statistics correct, my > questions are: (1) is this thinking correct, as I don?t see many 3x2 > factorial models to follow, and (2) do I need to set up a reference > somehow (which I assume would be the set of four samples with > TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is > correct or needed). > > Many thanks in advance for your insight! > ~Hilary > >> designFF <- model.matrix(~Treatment*Species*Age) >> colnames(designFF) > [1] "(Intercept)" > [2] " TreatmentStress" > [3] "SpeciesA " > [4] "Time1" > [5] "TreatmentStress:SpeciesA" > [6] "TreatmentStress:Time1" > [7] "SpeciesA:Time1" > [8] "TreatmentStress:SpeciesA:Time1" > > And then to run tests with: >> fit <- glmFit(y, designFF) > >> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD COMMENT • link 11.5 years ago Gordon Smyth 51k

Entering edit mode

Dear Gordon, Thank you. We cannot really remove the 3-way interaction, because there are some genes that respond to the 3-way interaction (even though a classic parameterization with the intercept, leads to about an order of magnitude more genes responding to a 2-way vs. 3-way interaction). Testing for species*treatment at each time, definitely seems the closest way to address the questions we have. Roughly following section 3.3.1 of the edgeR user guide ("Defining each treatment combination as a group") on the creation of specific contrasts (and many thanks for your updated User Guides to show this formulation in detail), I set up tests as below. If this is valid, that's great, as it does make much more intuitive sense to me (and biological in terms of addressing our questions of interest). Thank you for your help; I REALLY appreciate it! Best, Hilary > design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) > design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) > myC_TimeI = makeContrasts( + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1, + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1, + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), + levels=design1) > > myC_TimeII = makeContrasts( + Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1, + Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1, + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), + levels=design2) Then, after using glmFit on each of the two makeContrasts above, I ran glmLRT 6 times, once on each of the 6 contrasts (i.e., on Treatment1_vs_Treatment2_SpeciesA_TimeI, Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. Ultimately I may then try to use the VennCounts/Diagram from limma to show the overlap between the 4 sets visually: Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; and Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit to set up, because my two time points have slightly different gene lists (i.e., to run the analyses on the Times differently, after filtering out reads to only retain those with cpm>1 in ?4 libraries, TimesI and II did not retain exactly the same genes, though it?s rather close). -------------------------------------------------- On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >Dear Hilary, > >Your test for the 3-way interaction is correct, although 3-way >interactions are pretty hard to interpret. > >However testing for the 2-way interaction in the presence of a 3-way >interaction does not make statistical sense. This is because the >parametrization of the 2-way interaction as a subset of the 3-way is >somewhat arbitrary. Before you can test the 2-way interaction >species*treatment in a meaningful way you would need to accept that the >3-way interaction is not necessary and remove it from the model. > >In general, I am of the opinion that classical statistical factorial >interation models do not usually provide the most meaningful >parametrizations for genomic experiments. In most cases, I prefer to fit >the saturated model (a different level for each treatment combination) >and >make specific contrasts. There is some discussion of this in the limma >User's Guide. > >In your case, I guess that you might want to test for species*treatment >interaction separately at each time point. It is almost impossible to do >this within the classical 3-way factorial setup. However it is easy with >the one-way approach I just mentioned, or else you could use: > > ~Age + Age:Species + Age:Treatment + Age:Species:Treatment > >Best wishes >Gordon > > >> Date: Thu, 9 May 2013 14:55:46 -0400 >> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: [BioC] Statistics question for multi-factor interaction test >> in edgeR >> >> Hi. I need to generate two GLM tests of a factorial design with RNA-Seq >> count data. I have 3 factors with 2 levels apiece (2 species X 2 >> treatments X 2 times), and 4 separate replicates each (i.e., we made a >> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >> interaction of species*treatment, as we think species A will alter gene >> expression in the treatment stress vs. treatment benign, whereas >>species >> B is expected to show little change. However, we?d like to also do >> another test of species*treatment*time, because it is possible that the >> ability of species A to alter gene expression in response to the stress >> treatment may differ at the 1st versus 2nd time point. >> >> I think the way to set this up, is to create a design matrix as >>follows, >> with the lrt test with coef 5 giving the differentially expressed genes >> for the species*treatment test, and coef 8 giving the the >>differentially >> expressed gene for the species*treatment*time test (after calling >> topTags that is). Yet to ensure I have the statistics correct, my >> questions are: (1) is this thinking correct, as I don?t see many 3x2 >> factorial models to follow, and (2) do I need to set up a reference >> somehow (which I assume would be the set of four samples with >> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >> correct or needed). >> >> Many thanks in advance for your insight! >> ~Hilary >> >>> designFF <- model.matrix(~Treatment*Species*Age) >>> colnames(designFF) >> [1] "(Intercept)" >> [2] " TreatmentStress" >> [3] "SpeciesA " >> [4] "Time1" >> [5] "TreatmentStress:SpeciesA" >> [6] "TreatmentStress:Time1" >> [7] "SpeciesA:Time1" >> [8] "TreatmentStress:SpeciesA:Time1" >> >> And then to run tests with: >>> fit <- glmFit(y, designFF) >> >>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) > >_____________________________________________________________________ _ >The information in this email is confidential and intended solely for the >addressee. >You must not disclose, forward, print or use it without the permission of >the sender. >_____________________________________________________________________ _

ADD REPLY • link 11.5 years ago Hilary Smith ▴ 70

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Dear Hilary, Makes sense to me. Personally, I would analyse all the libraries together (with 8 groups) instead of using separate glmFits for the two ages. The same contrasts will still work. Then there is no issue with different numbers of rows etc. Best wishes Gordon On Sat, 11 May 2013, Hilary Smith wrote: > Dear Gordon, > Thank you. We cannot really remove the 3-way interaction, because there > are some genes that respond to the 3-way interaction (even though a > classic parameterization with the intercept, leads to about an order of > magnitude more genes responding to a 2-way vs. 3-way interaction). > > Testing for species*treatment at each time, definitely seems the closest > way to address the questions we have. Roughly following section 3.3.1 of > the edgeR user guide ("Defining each treatment combination as a group") on > the creation of specific contrasts (and many thanks for your updated User > Guides to show this formulation in detail), I set up tests as below. If > this is valid, that's great, as it does make much more intuitive sense to > me (and biological in terms of addressing our questions of interest). > > Thank you for your help; I REALLY appreciate it! > Best, > Hilary > >> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) > >> myC_TimeI = makeContrasts( > + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 - > SpeciesA_TimeI_Treatment1, > + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 - > SpeciesB_TimeI_Treatment1, > + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = > (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( > SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), > + levels=design1) >> >> myC_TimeII = makeContrasts( > + Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 - > SpeciesA_TimeII_Treatment1, > + Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 - > SpeciesB_TimeII_Treatment1, > + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = > (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( > SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), > + levels=design2) > > Then, after using glmFit on each of the two makeContrasts above, I ran > glmLRT 6 times, once on each of the 6 contrasts (i.e., on > Treatment1_vs_Treatment2_SpeciesA_TimeI, > Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through > Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. > > Ultimately I may then try to use the VennCounts/Diagram from limma to show > the overlap between the 4 sets visually: > Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 > vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; and > Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit to > set up, because my two time points have slightly different gene lists > (i.e., to run the analyses on the Times differently, after filtering out > reads to only retain those with cpm>1 in ?4 libraries, TimesI and II did > not retain exactly the same genes, though it?s rather close). > > > > > > -------------------------------------------------- > > > > > > > On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Hilary, >> >> Your test for the 3-way interaction is correct, although 3-way >> interactions are pretty hard to interpret. >> >> However testing for the 2-way interaction in the presence of a 3-way >> interaction does not make statistical sense. This is because the >> parametrization of the 2-way interaction as a subset of the 3-way is >> somewhat arbitrary. Before you can test the 2-way interaction >> species*treatment in a meaningful way you would need to accept that the >> 3-way interaction is not necessary and remove it from the model. >> >> In general, I am of the opinion that classical statistical factorial >> interation models do not usually provide the most meaningful >> parametrizations for genomic experiments. In most cases, I prefer to fit >> the saturated model (a different level for each treatment combination) >> and >> make specific contrasts. There is some discussion of this in the limma >> User's Guide. >> >> In your case, I guess that you might want to test for species*treatment >> interaction separately at each time point. It is almost impossible to do >> this within the classical 3-way factorial setup. However it is easy with >> the one-way approach I just mentioned, or else you could use: >> >> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >> >> Best wishes >> Gordon >> >> >>> Date: Thu, 9 May 2013 14:55:46 -0400 >>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>> Subject: [BioC] Statistics question for multi-factor interaction test >>> in edgeR >>> >>> Hi. I need to generate two GLM tests of a factorial design with RNA-Seq >>> count data. I have 3 factors with 2 levels apiece (2 species X 2 >>> treatments X 2 times), and 4 separate replicates each (i.e., we made a >>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >>> interaction of species*treatment, as we think species A will alter gene >>> expression in the treatment stress vs. treatment benign, whereas >>> species >>> B is expected to show little change. However, we?d like to also do >>> another test of species*treatment*time, because it is possible that the >>> ability of species A to alter gene expression in response to the stress >>> treatment may differ at the 1st versus 2nd time point. >>> >>> I think the way to set this up, is to create a design matrix as >>> follows, >>> with the lrt test with coef 5 giving the differentially expressed genes >>> for the species*treatment test, and coef 8 giving the the >>> differentially >>> expressed gene for the species*treatment*time test (after calling >>> topTags that is). Yet to ensure I have the statistics correct, my >>> questions are: (1) is this thinking correct, as I don?t see many 3x2 >>> factorial models to follow, and (2) do I need to set up a reference >>> somehow (which I assume would be the set of four samples with >>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>> correct or needed). >>> >>> Many thanks in advance for your insight! >>> ~Hilary >>> >>>> designFF <- model.matrix(~Treatment*Species*Age) >>>> colnames(designFF) >>> [1] "(Intercept)" >>> [2] " TreatmentStress" >>> [3] "SpeciesA " >>> [4] "Time1" >>> [5] "TreatmentStress:SpeciesA" >>> [6] "TreatmentStress:Time1" >>> [7] "SpeciesA:Time1" >>> [8] "TreatmentStress:SpeciesA:Time1" >>> >>> And then to run tests with: >>>> fit <- glmFit(y, designFF) >>> >>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD COMMENT • link 11.5 years ago Gordon Smyth 51k

Entering edit mode

One further question. I'm glad to hear it's acceptable to run the cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 groups (32 libraries), and then just use the makeContrasts function. Yet if I use makeContrasts to specify the 6 tests as noted in the prior email/post, should I lower my "significant" FDR cutoff from 0.05 to (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account for the multiple testing of all the different genes or tags. However, I am unsure whether it's necessary to also have a correction for the 6 different contrasts I am testing, or if simply making the model >design = model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, group=Group), and Group contains all 8 groups (2 species * 2 treatments * 2 times), intrinsically accounts for this in how it sets up the model. I assume makeContrasts or the design function accounts for this, but I just want to be sure. Thank you again, Hilary -------------------------------------------------- Dr. Hilary April Smith Postdoctoral Research Associate University of Notre Dame Department of Biological Sciences 321 Galvin Life Sciences On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >Dear Hilary, > >Makes sense to me. > >Personally, I would analyse all the libraries together (with 8 groups) >instead of using separate glmFits for the two ages. The same contrasts >will still work. Then there is no issue with different numbers of rows >etc. > >Best wishes >Gordon > >On Sat, 11 May 2013, Hilary Smith wrote: > >> Dear Gordon, >> Thank you. We cannot really remove the 3-way interaction, because there >> are some genes that respond to the 3-way interaction (even though a >> classic parameterization with the intercept, leads to about an order of >> magnitude more genes responding to a 2-way vs. 3-way interaction). >> >> Testing for species*treatment at each time, definitely seems the closest >> way to address the questions we have. Roughly following section 3.3.1 of >> the edgeR user guide ("Defining each treatment combination as a group") >>on >> the creation of specific contrasts (and many thanks for your updated >>User >> Guides to show this formulation in detail), I set up tests as below. If >> this is valid, that's great, as it does make much more intuitive sense >>to >> me (and biological in terms of addressing our questions of interest). >> >> Thank you for your help; I REALLY appreciate it! >> Best, >> Hilary >> >>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >> >>> myC_TimeI = makeContrasts( >> + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 - >> SpeciesA_TimeI_Treatment1, >> + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 - >> SpeciesB_TimeI_Treatment1, >> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >> + levels=design1) >>> >>> myC_TimeII = makeContrasts( >> + Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 >>- >> SpeciesA_TimeII_Treatment1, >> + Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 >>- >> SpeciesB_TimeII_Treatment1, >> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >> + levels=design2) >> >> Then, after using glmFit on each of the two makeContrasts above, I ran >> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >> Treatment1_vs_Treatment2_SpeciesA_TimeI, >> Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through >> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >> >> Ultimately I may then try to use the VennCounts/Diagram from limma to >>show >> the overlap between the 4 sets visually: >> Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 >> vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; >>and >> Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit to >> set up, because my two time points have slightly different gene lists >> (i.e., to run the analyses on the Times differently, after filtering out >> reads to only retain those with cpm>1 in ?4 libraries, TimesI and II did >> not retain exactly the same genes, though it?s rather close). >> >> >> >> >> >> -------------------------------------------------- >> >> >> >> >> >> >> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Hilary, >>> >>> Your test for the 3-way interaction is correct, although 3-way >>> interactions are pretty hard to interpret. >>> >>> However testing for the 2-way interaction in the presence of a 3-way >>> interaction does not make statistical sense. This is because the >>> parametrization of the 2-way interaction as a subset of the 3-way is >>> somewhat arbitrary. Before you can test the 2-way interaction >>> species*treatment in a meaningful way you would need to accept that the >>> 3-way interaction is not necessary and remove it from the model. >>> >>> In general, I am of the opinion that classical statistical factorial >>> interation models do not usually provide the most meaningful >>> parametrizations for genomic experiments. In most cases, I prefer to >>>fit >>> the saturated model (a different level for each treatment combination) >>> and >>> make specific contrasts. There is some discussion of this in the limma >>> User's Guide. >>> >>> In your case, I guess that you might want to test for species*treatment >>> interaction separately at each time point. It is almost impossible to >>>do >>> this within the classical 3-way factorial setup. However it is easy >>>with >>> the one-way approach I just mentioned, or else you could use: >>> >>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>> >>> Best wishes >>> Gordon >>> >>> >>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>> Subject: [BioC] Statistics question for multi-factor interaction test >>>> in edgeR >>>> >>>> Hi. I need to generate two GLM tests of a factorial design with >>>>RNA-Seq >>>> count data. I have 3 factors with 2 levels apiece (2 species X 2 >>>> treatments X 2 times), and 4 separate replicates each (i.e., we made a >>>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >>>> interaction of species*treatment, as we think species A will alter >>>>gene >>>> expression in the treatment stress vs. treatment benign, whereas >>>> species >>>> B is expected to show little change. However, we?d like to also do >>>> another test of species*treatment*time, because it is possible that >>>>the >>>> ability of species A to alter gene expression in response to the >>>>stress >>>> treatment may differ at the 1st versus 2nd time point. >>>> >>>> I think the way to set this up, is to create a design matrix as >>>> follows, >>>> with the lrt test with coef 5 giving the differentially expressed >>>>genes >>>> for the species*treatment test, and coef 8 giving the the >>>> differentially >>>> expressed gene for the species*treatment*time test (after calling >>>> topTags that is). Yet to ensure I have the statistics correct, my >>>> questions are: (1) is this thinking correct, as I don?t see many 3x2 >>>> factorial models to follow, and (2) do I need to set up a reference >>>> somehow (which I assume would be the set of four samples with >>>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>>> correct or needed). >>>> >>>> Many thanks in advance for your insight! >>>> ~Hilary >>>> >>>>> designFF <- model.matrix(~Treatment*Species*Age) >>>>> colnames(designFF) >>>> [1] "(Intercept)" >>>> [2] " TreatmentStress" >>>> [3] "SpeciesA " >>>> [4] "Time1" >>>> [5] "TreatmentStress:SpeciesA" >>>> [6] "TreatmentStress:Time1" >>>> [7] "SpeciesA:Time1" >>>> [8] "TreatmentStress:SpeciesA:Time1" >>>> >>>> And then to run tests with: >>>>> fit <- glmFit(y, designFF) >>>> >>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) > >_____________________________________________________________________ _ >The information in this email is confidential and intended solely for the >addressee. >You must not disclose, forward, print or use it without the permission of >the sender. >_____________________________________________________________________ _

ADD REPLY • link 11.5 years ago Hilary Smith ▴ 70

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 3 hours ago

WEHI, Melbourne, Australia

Dear Hilary, Unlike p-values (Type I error control), FDR is a scalable loss function, so it is not generally necessary to decrease the cutoff when adding more tests. It is not correct or necessary to divide FDR by the number of tests done (a la Bonferroni p-values). The limma package (decideTests function) has a number of strategies for combining FDR across multiple contrasts as well as multiple genes. However the most commmon practice is the default, which is to simply do the FDR adjustment separately for each contrast. For the limited number of contrasts you are considering, I would be happy enough with this strategy. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Mon, 13 May 2013, Hilary Smith wrote: > One further question. I'm glad to hear it's acceptable to run the > cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 > groups (32 libraries), and then just use the makeContrasts function. Yet > if I use makeContrasts to specify the 6 tests as noted in the prior > email/post, should I lower my "significant" FDR cutoff from 0.05 to > (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account for > the multiple testing of all the different genes or tags. However, I am > unsure whether it's necessary to also have a correction for the 6 > different contrasts I am testing, or if simply making the model >design = > model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, > group=Group), and Group contains all 8 groups (2 species * 2 treatments * > 2 times), intrinsically accounts for this in how it sets up the model. I > assume makeContrasts or the design function accounts for this, but I just > want to be sure. > > Thank you again, > Hilary > > > > -------------------------------------------------- > Dr. Hilary April Smith > Postdoctoral Research Associate > University of Notre Dame > Department of Biological Sciences > 321 Galvin Life Sciences > > > > > > On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: > >> Dear Hilary, >> >> Makes sense to me. >> >> Personally, I would analyse all the libraries together (with 8 groups) >> instead of using separate glmFits for the two ages. The same contrasts >> will still work. Then there is no issue with different numbers of rows >> etc. >> >> Best wishes >> Gordon >> >> On Sat, 11 May 2013, Hilary Smith wrote: >> >>> Dear Gordon, >>> Thank you. We cannot really remove the 3-way interaction, because there >>> are some genes that respond to the 3-way interaction (even though a >>> classic parameterization with the intercept, leads to about an order of >>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>> >>> Testing for species*treatment at each time, definitely seems the closest >>> way to address the questions we have. Roughly following section 3.3.1 of >>> the edgeR user guide ("Defining each treatment combination as a group") >>> on >>> the creation of specific contrasts (and many thanks for your updated >>> User >>> Guides to show this formulation in detail), I set up tests as below. If >>> this is valid, that's great, as it does make much more intuitive sense >>> to >>> me (and biological in terms of addressing our questions of interest). >>> >>> Thank you for your help; I REALLY appreciate it! >>> Best, >>> Hilary >>> >>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>> >>>> myC_TimeI = makeContrasts( >>> + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 - >>> SpeciesA_TimeI_Treatment1, >>> + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 - >>> SpeciesB_TimeI_Treatment1, >>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>> + levels=design1) >>>> >>>> myC_TimeII = makeContrasts( >>> + Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 >>> - >>> SpeciesA_TimeII_Treatment1, >>> + Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 >>> - >>> SpeciesB_TimeII_Treatment1, >>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>> + levels=design2) >>> >>> Then, after using glmFit on each of the two makeContrasts above, I ran >>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>> Treatment1_vs_Treatment2_SpeciesA_TimeI, >>> Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through >>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>> >>> Ultimately I may then try to use the VennCounts/Diagram from limma to >>> show >>> the overlap between the 4 sets visually: >>> Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 >>> vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; >>> and >>> Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit to >>> set up, because my two time points have slightly different gene lists >>> (i.e., to run the analyses on the Times differently, after filtering out >>> reads to only retain those with cpm>1 in ?4 libraries, TimesI and II did >>> not retain exactly the same genes, though it?s rather close). >>> >>> >>> >>> >>> >>> -------------------------------------------------- >>> >>> >>> >>> >>> >>> >>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Hilary, >>>> >>>> Your test for the 3-way interaction is correct, although 3-way >>>> interactions are pretty hard to interpret. >>>> >>>> However testing for the 2-way interaction in the presence of a 3-way >>>> interaction does not make statistical sense. This is because the >>>> parametrization of the 2-way interaction as a subset of the 3-way is >>>> somewhat arbitrary. Before you can test the 2-way interaction >>>> species*treatment in a meaningful way you would need to accept that the >>>> 3-way interaction is not necessary and remove it from the model. >>>> >>>> In general, I am of the opinion that classical statistical factorial >>>> interation models do not usually provide the most meaningful >>>> parametrizations for genomic experiments. In most cases, I prefer to >>>> fit >>>> the saturated model (a different level for each treatment combination) >>>> and >>>> make specific contrasts. There is some discussion of this in the limma >>>> User's Guide. >>>> >>>> In your case, I guess that you might want to test for species*treatment >>>> interaction separately at each time point. It is almost impossible to >>>> do >>>> this within the classical 3-way factorial setup. However it is easy >>>> with >>>> the one-way approach I just mentioned, or else you could use: >>>> >>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>> >>>> Best wishes >>>> Gordon >>>> >>>> >>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>> Subject: [BioC] Statistics question for multi-factor interaction test >>>>> in edgeR >>>>> >>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>> RNA-Seq >>>>> count data. I have 3 factors with 2 levels apiece (2 species X 2 >>>>> treatments X 2 times), and 4 separate replicates each (i.e., we made a >>>>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >>>>> interaction of species*treatment, as we think species A will alter >>>>> gene >>>>> expression in the treatment stress vs. treatment benign, whereas >>>>> species >>>>> B is expected to show little change. However, we?d like to also do >>>>> another test of species*treatment*time, because it is possible that >>>>> the >>>>> ability of species A to alter gene expression in response to the >>>>> stress >>>>> treatment may differ at the 1st versus 2nd time point. >>>>> >>>>> I think the way to set this up, is to create a design matrix as >>>>> follows, >>>>> with the lrt test with coef 5 giving the differentially expressed >>>>> genes >>>>> for the species*treatment test, and coef 8 giving the the >>>>> differentially >>>>> expressed gene for the species*treatment*time test (after calling >>>>> topTags that is). Yet to ensure I have the statistics correct, my >>>>> questions are: (1) is this thinking correct, as I don?t see many 3x2 >>>>> factorial models to follow, and (2) do I need to set up a reference >>>>> somehow (which I assume would be the set of four samples with >>>>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>>>> correct or needed). >>>>> >>>>> Many thanks in advance for your insight! >>>>> ~Hilary >>>>> >>>>>> designFF <- model.matrix(~Treatment*Species*Age) >>>>>> colnames(designFF) >>>>> [1] "(Intercept)" >>>>> [2] " TreatmentStress" >>>>> [3] "SpeciesA " >>>>> [4] "Time1" >>>>> [5] "TreatmentStress:SpeciesA" >>>>> [6] "TreatmentStress:Time1" >>>>> [7] "SpeciesA:Time1" >>>>> [8] "TreatmentStress:SpeciesA:Time1" >>>>> >>>>> And then to run tests with: >>>>>> fit <- glmFit(y, designFF) >>>>> >>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD COMMENT • link 11.5 years ago Gordon Smyth 51k

Entering edit mode

Excellent; that is very good to hear. Thank you, and I'll take a closer look at the limma user guide. I haven't done microarrays, so I had not gone through the limma guide closely enough. -------------------------------------------------- Dr. Hilary April Smith Postdoctoral Research Associate University of Notre Dame Department of Biological Sciences 321 Galvin Life Sciences On 5/13/13 7:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >Dear Hilary, > >Unlike p-values (Type I error control), FDR is a scalable loss function, >so it is not generally necessary to decrease the cutoff when adding more >tests. It is not correct or necessary to divide FDR by the number of >tests done (a la Bonferroni p-values). > >The limma package (decideTests function) has a number of strategies for >combining FDR across multiple contrasts as well as multiple genes. >However the most commmon practice is the default, which is to simply do >the FDR adjustment separately for each contrast. For the limited number >of contrasts you are considering, I would be happy enough with this >strategy. > >Best wishes >Gordon > >--------------------------------------------- >Professor Gordon K Smyth, >Bioinformatics Division, >Walter and Eliza Hall Institute of Medical Research, >1G Royal Parade, Parkville, Vic 3052, Australia. >http://www.statsci.org/smyth > >On Mon, 13 May 2013, Hilary Smith wrote: > >> One further question. I'm glad to hear it's acceptable to run the >> cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 >> groups (32 libraries), and then just use the makeContrasts function. Yet >> if I use makeContrasts to specify the 6 tests as noted in the prior >> email/post, should I lower my "significant" FDR cutoff from 0.05 to >> (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account >>for >> the multiple testing of all the different genes or tags. However, I am >> unsure whether it's necessary to also have a correction for the 6 >> different contrasts I am testing, or if simply making the model >design >>= >> model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, >> group=Group), and Group contains all 8 groups (2 species * 2 treatments >>* >> 2 times), intrinsically accounts for this in how it sets up the model. I >> assume makeContrasts or the design function accounts for this, but I >>just >> want to be sure. >> >> Thank you again, >> Hilary >> >> >> >> -------------------------------------------------- >> Dr. Hilary April Smith >> Postdoctoral Research Associate >> University of Notre Dame >> Department of Biological Sciences >> 321 Galvin Life Sciences >> >> >> >> >> >> On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Hilary, >>> >>> Makes sense to me. >>> >>> Personally, I would analyse all the libraries together (with 8 groups) >>> instead of using separate glmFits for the two ages. The same contrasts >>> will still work. Then there is no issue with different numbers of rows >>> etc. >>> >>> Best wishes >>> Gordon >>> >>> On Sat, 11 May 2013, Hilary Smith wrote: >>> >>>> Dear Gordon, >>>> Thank you. We cannot really remove the 3-way interaction, because >>>>there >>>> are some genes that respond to the 3-way interaction (even though a >>>> classic parameterization with the intercept, leads to about an order >>>>of >>>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>>> >>>> Testing for species*treatment at each time, definitely seems the >>>>closest >>>> way to address the questions we have. Roughly following section 3.3.1 >>>>of >>>> the edgeR user guide ("Defining each treatment combination as a >>>>group") >>>> on >>>> the creation of specific contrasts (and many thanks for your updated >>>> User >>>> Guides to show this formulation in detail), I set up tests as below. >>>>If >>>> this is valid, that's great, as it does make much more intuitive sense >>>> to >>>> me (and biological in terms of addressing our questions of interest). >>>> >>>> Thank you for your help; I REALLY appreciate it! >>>> Best, >>>> Hilary >>>> >>>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>>> >>>>> myC_TimeI = makeContrasts( >>>> + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 >>>>- >>>> SpeciesA_TimeI_Treatment1, >>>> + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 >>>>- >>>> SpeciesB_TimeI_Treatment1, >>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>>> + levels=design1) >>>>> >>>>> myC_TimeII = makeContrasts( >>>> + Treatment1_vs_Treatment2_SpeciesA_TimeII = >>>>SpeciesA_TimeII_Treatment2 >>>> - >>>> SpeciesA_TimeII_Treatment1, >>>> + Treatment1_vs_Treatment2_SpeciesB_TimeII = >>>>SpeciesB_TimeII_Treatment2 >>>> - >>>> SpeciesB_TimeII_Treatment1, >>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>>> + levels=design2) >>>> >>>> Then, after using glmFit on each of the two makeContrasts above, I ran >>>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>>> Treatment1_vs_Treatment2_SpeciesA_TimeI, >>>> Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through >>>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>>> >>>> Ultimately I may then try to use the VennCounts/Diagram from limma to >>>> show >>>> the overlap between the 4 sets visually: >>>> Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 >>>> vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; >>>> and >>>> Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit >>>>to >>>> set up, because my two time points have slightly different gene lists >>>> (i.e., to run the analyses on the Times differently, after filtering >>>>out >>>> reads to only retain those with cpm>1 in ?4 libraries, TimesI and II >>>>did >>>> not retain exactly the same genes, though it?s rather close). >>>> >>>> >>>> >>>> >>>> >>>> -------------------------------------------------- >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>>> >>>>> Dear Hilary, >>>>> >>>>> Your test for the 3-way interaction is correct, although 3-way >>>>> interactions are pretty hard to interpret. >>>>> >>>>> However testing for the 2-way interaction in the presence of a 3-way >>>>> interaction does not make statistical sense. This is because the >>>>> parametrization of the 2-way interaction as a subset of the 3-way is >>>>> somewhat arbitrary. Before you can test the 2-way interaction >>>>> species*treatment in a meaningful way you would need to accept that >>>>>the >>>>> 3-way interaction is not necessary and remove it from the model. >>>>> >>>>> In general, I am of the opinion that classical statistical factorial >>>>> interation models do not usually provide the most meaningful >>>>> parametrizations for genomic experiments. In most cases, I prefer to >>>>> fit >>>>> the saturated model (a different level for each treatment >>>>>combination) >>>>> and >>>>> make specific contrasts. There is some discussion of this in the >>>>>limma >>>>> User's Guide. >>>>> >>>>> In your case, I guess that you might want to test for >>>>>species*treatment >>>>> interaction separately at each time point. It is almost impossible >>>>>to >>>>> do >>>>> this within the classical 3-way factorial setup. However it is easy >>>>> with >>>>> the one-way approach I just mentioned, or else you could use: >>>>> >>>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>> >>>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] Statistics question for multi-factor interaction >>>>>>test >>>>>> in edgeR >>>>>> >>>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>>> RNA-Seq >>>>>> count data. I have 3 factors with 2 levels apiece (2 species X 2 >>>>>> treatments X 2 times), and 4 separate replicates each (i.e., we >>>>>>made a >>>>>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in >>>>>>the >>>>>> interaction of species*treatment, as we think species A will alter >>>>>> gene >>>>>> expression in the treatment stress vs. treatment benign, whereas >>>>>> species >>>>>> B is expected to show little change. However, we?d like to also do >>>>>> another test of species*treatment*time, because it is possible that >>>>>> the >>>>>> ability of species A to alter gene expression in response to the >>>>>> stress >>>>>> treatment may differ at the 1st versus 2nd time point. >>>>>> >>>>>> I think the way to set this up, is to create a design matrix as >>>>>> follows, >>>>>> with the lrt test with coef 5 giving the differentially expressed >>>>>> genes >>>>>> for the species*treatment test, and coef 8 giving the the >>>>>> differentially >>>>>> expressed gene for the species*treatment*time test (after calling >>>>>> topTags that is). Yet to ensure I have the statistics correct, my >>>>>> questions are: (1) is this thinking correct, as I don?t see many 3x2 >>>>>> factorial models to follow, and (2) do I need to set up a reference >>>>>> somehow (which I assume would be the set of four samples with >>>>>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>>>>> correct or needed). >>>>>> >>>>>> Many thanks in advance for your insight! >>>>>> ~Hilary >>>>>> >>>>>>> designFF <- model.matrix(~Treatment*Species*Age) >>>>>>> colnames(designFF) >>>>>> [1] "(Intercept)" >>>>>> [2] " TreatmentStress" >>>>>> [3] "SpeciesA " >>>>>> [4] "Time1" >>>>>> [5] "TreatmentStress:SpeciesA" >>>>>> [6] "TreatmentStress:Time1" >>>>>> [7] "SpeciesA:Time1" >>>>>> [8] "TreatmentStress:SpeciesA:Time1" >>>>>> >>>>>> And then to run tests with: >>>>>>> fit <- glmFit(y, designFF) >>>>>> >>>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) > >_____________________________________________________________________ _ >The information in this email is confidential and intended solely for the >addressee. >You must not disclose, forward, print or use it without the permission of >the sender. >_____________________________________________________________________ _

ADD REPLY • link 11.5 years ago Hilary Smith ▴ 70

Entering edit mode

Hi Hilary, Just to make it clear what I mean by scalability: Suppose that you successfully control the FDR to be less than 0.05 for each contrast separately. Then it is an automatic consequence that the FDR for all the tests pooled together is also less than 0.05. Best wishes Gordon On Tue, 14 May 2013, Gordon K Smyth wrote: > Dear Hilary, > > Unlike p-values (Type I error control), FDR is a scalable loss function, so > it is not generally necessary to decrease the cutoff when adding more tests. > It is not correct or necessary to divide FDR by the number of tests done (a > la Bonferroni p-values). > > The limma package (decideTests function) has a number of strategies for > combining FDR across multiple contrasts as well as multiple genes. However > the most commmon practice is the default, which is to simply do the FDR > adjustment separately for each contrast. For the limited number of contrasts > you are considering, I would be happy enough with this strategy. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > On Mon, 13 May 2013, Hilary Smith wrote: > >> One further question. I'm glad to hear it's acceptable to run the >> cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 >> groups (32 libraries), and then just use the makeContrasts function. Yet >> if I use makeContrasts to specify the 6 tests as noted in the prior >> email/post, should I lower my "significant" FDR cutoff from 0.05 to >> (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account for >> the multiple testing of all the different genes or tags. However, I am >> unsure whether it's necessary to also have a correction for the 6 >> different contrasts I am testing, or if simply making the model >design = >> model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, >> group=Group), and Group contains all 8 groups (2 species * 2 treatments * >> 2 times), intrinsically accounts for this in how it sets up the model. I >> assume makeContrasts or the design function accounts for this, but I just >> want to be sure. >> >> Thank you again, >> Hilary >> >> >> >> -------------------------------------------------- >> Dr. Hilary April Smith >> Postdoctoral Research Associate >> University of Notre Dame >> Department of Biological Sciences >> 321 Galvin Life Sciences >> >> >> >> >> >> On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >> >>> Dear Hilary, >>> >>> Makes sense to me. >>> >>> Personally, I would analyse all the libraries together (with 8 groups) >>> instead of using separate glmFits for the two ages. The same contrasts >>> will still work. Then there is no issue with different numbers of rows >>> etc. >>> >>> Best wishes >>> Gordon >>> >>> On Sat, 11 May 2013, Hilary Smith wrote: >>> >>>> Dear Gordon, >>>> Thank you. We cannot really remove the 3-way interaction, because there >>>> are some genes that respond to the 3-way interaction (even though a >>>> classic parameterization with the intercept, leads to about an order of >>>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>>> >>>> Testing for species*treatment at each time, definitely seems the >>>> closest way to address the questions we have. Roughly following >>>> section 3.3.1 of the edgeR user guide ("Defining each treatment >>>> combination as a group") on the creation of specific contrasts (and >>>> many thanks for your updated User Guides to show this formulation in >>>> detail), I set up tests as below. If this is valid, that's great, as >>>> it does make much more intuitive sense to me (and biological in terms >>>> of addressing our questions of interest). >>>> >>>> Thank you for your help; I REALLY appreciate it! >>>> Best, >>>> Hilary >>>> >>>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>>> >>>>> myC_TimeI = makeContrasts( >>>> + Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 - >>>> SpeciesA_TimeI_Treatment1, >>>> + Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 - >>>> SpeciesB_TimeI_Treatment1, >>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>>> + levels=design1) >>>>> >>>>> myC_TimeII = makeContrasts( >>>> + Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 >>>> - >>>> SpeciesA_TimeII_Treatment1, >>>> + Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 >>>> - >>>> SpeciesB_TimeII_Treatment1, >>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>>> + levels=design2) >>>> >>>> Then, after using glmFit on each of the two makeContrasts above, I ran >>>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>>> Treatment1_vs_Treatment2_SpeciesA_TimeI, >>>> Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through >>>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>>> >>>> Ultimately I may then try to use the VennCounts/Diagram from limma to >>>> show the overlap between the 4 sets visually: >>>> Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 >>>> vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; >>>> and Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a >>>> bit to set up, because my two time points have slightly different >>>> gene lists (i.e., to run the analyses on the Times differently, after >>>> filtering out reads to only retain those with cpm>1 in ?4 >>>> libraries, TimesI and II did not retain exactly the same genes, >>>> though it?s rather close). >>>> >>>> >>>> >>>> >>>> >>>> -------------------------------------------------- >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>>> >>>>> Dear Hilary, >>>>> >>>>> Your test for the 3-way interaction is correct, although 3-way >>>>> interactions are pretty hard to interpret. >>>>> >>>>> However testing for the 2-way interaction in the presence of a 3-way >>>>> interaction does not make statistical sense. This is because the >>>>> parametrization of the 2-way interaction as a subset of the 3-way is >>>>> somewhat arbitrary. Before you can test the 2-way interaction >>>>> species*treatment in a meaningful way you would need to accept that >>>>> the 3-way interaction is not necessary and remove it from the model. >>>>> >>>>> In general, I am of the opinion that classical statistical factorial >>>>> interation models do not usually provide the most meaningful >>>>> parametrizations for genomic experiments. In most cases, I prefer >>>>> to fit the saturated model (a different level for each treatment >>>>> combination) and make specific contrasts. There is some discussion >>>>> of this in the limma User's Guide. >>>>> >>>>> In your case, I guess that you might want to test for >>>>> species*treatment interaction separately at each time point. It is >>>>> almost impossible to do this within the classical 3-way factorial >>>>> setup. However it is easy with the one-way approach I just >>>>> mentioned, or else you could use: >>>>> >>>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>> >>>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>>> Subject: [BioC] Statistics question for multi-factor interaction test >>>>>> in edgeR >>>>>> >>>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>>> RNA-Seq count data. I have 3 factors with 2 levels apiece (2 >>>>>> species X 2 treatments X 2 times), and 4 separate replicates each >>>>>> (i.e., we made a total of 2*2*2*4 = 32 separate libraries). Our >>>>>> main interest is in the interaction of species*treatment, as we >>>>>> think species A will alter gene expression in the treatment stress >>>>>> vs. treatment benign, whereas species B is expected to show little >>>>>> change. However, we?d like to also do another test of >>>>>> species*treatment*time, because it is possible that the ability of >>>>>> species A to alter gene expression in response to the stress >>>>>> treatment may differ at the 1st versus 2nd time point. >>>>>> >>>>>> I think the way to set this up, is to create a design matrix as >>>>>> follows, with the lrt test with coef 5 giving the differentially >>>>>> expressed genes for the species*treatment test, and coef 8 giving >>>>>> the the differentially expressed gene for the >>>>>> species*treatment*time test (after calling topTags that is). Yet to >>>>>> ensure I have the statistics correct, my questions are: (1) is this >>>>>> thinking correct, as I don?t see many 3x2 factorial models to >>>>>> follow, and (2) do I need to set up a reference somehow (which I >>>>>> assume would be the set of four samples with >>>>>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>>>>> correct or needed). >>>>>> >>>>>> Many thanks in advance for your insight! >>>>>> ~Hilary >>>>>> >>>>>>> designFF <- model.matrix(~Treatment*Species*Age) >>>>>>> colnames(designFF) >>>>>> [1] "(Intercept)" >>>>>> [2] " TreatmentStress" >>>>>> [3] "SpeciesA " >>>>>> [4] "Time1" >>>>>> [5] "TreatmentStress:SpeciesA" >>>>>> [6] "TreatmentStress:Time1" >>>>>> [7] "SpeciesA:Time1" >>>>>> [8] "TreatmentStress:SpeciesA:Time1" >>>>>> >>>>>> And then to run tests with: >>>>>>> fit <- glmFit(y, designFF) >>>>>> >>>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}

ADD REPLY • link 11.5 years ago Gordon Smyth 51k

Entering edit mode

This is a logical consequence of the sequential step-down (step-up?) approach, right? thanks, --t On Mon, May 13, 2013 at 5:02 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Hi Hilary, > > Just to make it clear what I mean by scalability: Suppose that you > successfully control the FDR to be less than 0.05 for each contrast > separately. Then it is an automatic consequence that the FDR for all the > tests pooled together is also less than 0.05. > > Best wishes > Gordon > > On Tue, 14 May 2013, Gordon K Smyth wrote: > > Dear Hilary, >> >> Unlike p-values (Type I error control), FDR is a scalable loss function, >> so it is not generally necessary to decrease the cutoff when adding more >> tests. It is not correct or necessary to divide FDR by the number of tests >> done (a la Bonferroni p-values). >> >> The limma package (decideTests function) has a number of strategies for >> combining FDR across multiple contrasts as well as multiple genes. However >> the most commmon practice is the default, which is to simply do the FDR >> adjustment separately for each contrast. For the limited number of >> contrasts you are considering, I would be happy enough with this strategy. >> >> Best wishes >> Gordon >> >> ------------------------------**--------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Mon, 13 May 2013, Hilary Smith wrote: >> >> One further question. I'm glad to hear it's acceptable to run the >>> cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 >>> groups (32 libraries), and then just use the makeContrasts function. Yet >>> if I use makeContrasts to specify the 6 tests as noted in the prior >>> email/post, should I lower my "significant" FDR cutoff from 0.05 to >>> (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account for >>> the multiple testing of all the different genes or tags. However, I am >>> unsure whether it's necessary to also have a correction for the 6 >>> different contrasts I am testing, or if simply making the model >design = >>> model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, >>> group=Group), and Group contains all 8 groups (2 species * 2 treatments * >>> 2 times), intrinsically accounts for this in how it sets up the model. I >>> assume makeContrasts or the design function accounts for this, but I just >>> want to be sure. >>> >>> Thank you again, >>> Hilary >>> >>> >>> >>> ------------------------------**-------------------- >>> Dr. Hilary April Smith >>> Postdoctoral Research Associate >>> University of Notre Dame >>> Department of Biological Sciences >>> 321 Galvin Life Sciences >>> >>> >>> >>> >>> >>> On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: >>> >>> Dear Hilary, >>>> >>>> Makes sense to me. >>>> >>>> Personally, I would analyse all the libraries together (with 8 groups) >>>> instead of using separate glmFits for the two ages. The same contrasts >>>> will still work. Then there is no issue with different numbers of rows >>>> etc. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> On Sat, 11 May 2013, Hilary Smith wrote: >>>> >>>> Dear Gordon, >>>>> Thank you. We cannot really remove the 3-way interaction, because there >>>>> are some genes that respond to the 3-way interaction (even though a >>>>> classic parameterization with the intercept, leads to about an order of >>>>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>>>> >>>>> Testing for species*treatment at each time, definitely seems the >>>>> closest way to address the questions we have. Roughly following section >>>>> 3.3.1 of the edgeR user guide ("Defining each treatment combination as a >>>>> group") on the creation of specific contrasts (and many thanks for your >>>>> updated User Guides to show this formulation in detail), I set up tests as >>>>> below. If this is valid, that's great, as it does make much more intuitive >>>>> sense to me (and biological in terms of addressing our questions of >>>>> interest). >>>>> >>>>> Thank you for your help; I REALLY appreciate it! >>>>> Best, >>>>> Hilary >>>>> >>>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>>>>> >>>>> >>>>> myC_TimeI = makeContrasts( >>>>>> >>>>> + Treatment1_vs_Treatment2_**SpeciesA_TimeI = >>>>> SpeciesA_TimeI_Treatment2 - >>>>> SpeciesA_TimeI_Treatment1, >>>>> + Treatment1_vs_Treatment2_**SpeciesB_TimeI = >>>>> SpeciesB_TimeI_Treatment2 - >>>>> SpeciesB_TimeI_Treatment1, >>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>>>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>>>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>>>> + levels=design1) >>>>> >>>>>> >>>>>> myC_TimeII = makeContrasts( >>>>>> >>>>> + Treatment1_vs_Treatment2_**SpeciesA_TimeII = >>>>> SpeciesA_TimeII_Treatment2 >>>>> - >>>>> SpeciesA_TimeII_Treatment1, >>>>> + Treatment1_vs_Treatment2_**SpeciesB_TimeII = >>>>> SpeciesB_TimeII_Treatment2 >>>>> - >>>>> SpeciesB_TimeII_Treatment1, >>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>>>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>>>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>>>> + levels=design2) >>>>> >>>>> Then, after using glmFit on each of the two makeContrasts above, I ran >>>>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>>>> Treatment1_vs_Treatment2_**SpeciesA_TimeI, >>>>> Treatment1_vs_Treatment2_**SpeciesB_TimeI, etc. through >>>>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>>>> >>>>> Ultimately I may then try to use the VennCounts/Diagram from limma to >>>>> show the overlap between the 4 sets visually: Treatment1vsTreatment2_* >>>>> *SpeciesA_TimeI; Treatment1 vsTreatment2_SpeciesA_TimeII; Treatment1 >>>>> vsTreatment2_SpeciesB_TimeI; and Treatment1 vsTreatment2_SpeciesB_TimeII. >>>>> However, this may take a bit to set up, because my two time points have >>>>> slightly different gene lists (i.e., to run the analyses on the Times >>>>> differently, after filtering out reads to only retain those with cpm>1 in >>>>> 4 libraries, TimesI and II did not retain exactly the same genes, though >>>>> it¹s rather close). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------**-------------------- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: >>>>> >>>>> Dear Hilary, >>>>>> >>>>>> Your test for the 3-way interaction is correct, although 3-way >>>>>> interactions are pretty hard to interpret. >>>>>> >>>>>> However testing for the 2-way interaction in the presence of a 3-way >>>>>> interaction does not make statistical sense. This is because the >>>>>> parametrization of the 2-way interaction as a subset of the 3-way is >>>>>> somewhat arbitrary. Before you can test the 2-way interaction >>>>>> species*treatment in a meaningful way you would need to accept that the >>>>>> 3-way interaction is not necessary and remove it from the model. >>>>>> >>>>>> In general, I am of the opinion that classical statistical factorial >>>>>> interation models do not usually provide the most meaningful >>>>>> parametrizations for genomic experiments. In most cases, I prefer to fit >>>>>> the saturated model (a different level for each treatment combination) and >>>>>> make specific contrasts. There is some discussion of this in the limma >>>>>> User's Guide. >>>>>> >>>>>> In your case, I guess that you might want to test for >>>>>> species*treatment interaction separately at each time point. It is almost >>>>>> impossible to do this within the classical 3-way factorial setup. However >>>>>> it is easy with the one-way approach I just mentioned, or else you could >>>>>> use: >>>>>> >>>>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>> >>>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>>>> From: Hilary Smith <hilary.a.smith.964@nd.edu> >>>>>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >>>>>>> Subject: [BioC] Statistics question for multi-factor interaction test >>>>>>> in edgeR >>>>>>> >>>>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>>>> RNA-Seq count data. I have 3 factors with 2 levels apiece (2 species X 2 >>>>>>> treatments X 2 times), and 4 separate replicates each (i.e., we made a >>>>>>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >>>>>>> interaction of species*treatment, as we think species A will alter gene >>>>>>> expression in the treatment stress vs. treatment benign, whereas species B >>>>>>> is expected to show little change. However, we¹d like to also do another >>>>>>> test of species*treatment*time, because it is possible that the ability of >>>>>>> species A to alter gene expression in response to the stress treatment may >>>>>>> differ at the 1st versus 2nd time point. >>>>>>> >>>>>>> I think the way to set this up, is to create a design matrix as >>>>>>> follows, with the lrt test with coef 5 giving the differentially expressed >>>>>>> genes for the species*treatment test, and coef 8 giving the the >>>>>>> differentially expressed gene for the species*treatment*time test (after >>>>>>> calling topTags that is). Yet to ensure I have the statistics correct, my >>>>>>> questions are: (1) is this thinking correct, as I don¹t see many 3x2 >>>>>>> factorial models to follow, and (2) do I need to set up a reference somehow >>>>>>> (which I assume would be the set of four samples with >>>>>>> TreatmentBenign*SpeciesB***Time2, but I¹m not fully sure if that is >>>>>>> correct or needed). >>>>>>> >>>>>>> Many thanks in advance for your insight! >>>>>>> ~Hilary >>>>>>> >>>>>>> designFF <- model.matrix(~Treatment***Species*Age) >>>>>>>> colnames(designFF) >>>>>>>> >>>>>>> [1] "(Intercept)" >>>>>>> [2] " TreatmentStress" >>>>>>> [3] "SpeciesA " >>>>>>> [4] "Time1" >>>>>>> [5] "TreatmentStress:SpeciesA" >>>>>>> [6] "TreatmentStress:Time1" >>>>>>> [7] "SpeciesA:Time1" >>>>>>> [8] "TreatmentStress:SpeciesA:**Time1" >>>>>>> >>>>>>> And then to run tests with: >>>>>>> >>>>>>>> fit <- glmFit(y, designFF) >>>>>>>> >>>>>>> >>>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>>>> lrtInteractionStressSpeciesTim**e <- glmLRT(fitFF, coef=8) >>>>>>>> >>>>>>> > ______________________________**______________________________**____ ______ > The information in this email is confidential and intend...{{dropped:5}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Tim Triche ★ 4.2k

Entering edit mode

Also, given the relationship between a beta-uniform mixture (for sufficiently large numbers of tests where the uniform is the null and the beta "spike" is the true rejections near p~0), does the sequential adjustment ever become superfluous? I seem to recall that there is a drawback to simply using the posterior probability of a non-null result. On Tue, May 14, 2013 at 8:05 AM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > This is a logical consequence of the sequential step-down (step-up?) > approach, right? > > thanks, > > --t > > > On Mon, May 13, 2013 at 5:02 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Hi Hilary, >> >> Just to make it clear what I mean by scalability: Suppose that you >> successfully control the FDR to be less than 0.05 for each contrast >> separately. Then it is an automatic consequence that the FDR for all the >> tests pooled together is also less than 0.05. >> >> Best wishes >> Gordon >> >> On Tue, 14 May 2013, Gordon K Smyth wrote: >> >> Dear Hilary, >>> >>> Unlike p-values (Type I error control), FDR is a scalable loss function, >>> so it is not generally necessary to decrease the cutoff when adding more >>> tests. It is not correct or necessary to divide FDR by the number of tests >>> done (a la Bonferroni p-values). >>> >>> The limma package (decideTests function) has a number of strategies for >>> combining FDR across multiple contrasts as well as multiple genes. However >>> the most commmon practice is the default, which is to simply do the FDR >>> adjustment separately for each contrast. For the limited number of >>> contrasts you are considering, I would be happy enough with this strategy. >>> >>> Best wishes >>> Gordon >>> >>> ------------------------------**--------------- >>> Professor Gordon K Smyth, >>> Bioinformatics Division, >>> Walter and Eliza Hall Institute of Medical Research, >>> 1G Royal Parade, Parkville, Vic 3052, Australia. >>> http://www.statsci.org/smyth >>> >>> On Mon, 13 May 2013, Hilary Smith wrote: >>> >>> One further question. I'm glad to hear it's acceptable to run the >>>> cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 >>>> groups (32 libraries), and then just use the makeContrasts function. Yet >>>> if I use makeContrasts to specify the 6 tests as noted in the prior >>>> email/post, should I lower my "significant" FDR cutoff from 0.05 to >>>> (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account >>>> for >>>> the multiple testing of all the different genes or tags. However, I am >>>> unsure whether it's necessary to also have a correction for the 6 >>>> different contrasts I am testing, or if simply making the model >design >>>> = >>>> model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, >>>> group=Group), and Group contains all 8 groups (2 species * 2 treatments >>>> * >>>> 2 times), intrinsically accounts for this in how it sets up the model. I >>>> assume makeContrasts or the design function accounts for this, but I >>>> just >>>> want to be sure. >>>> >>>> Thank you again, >>>> Hilary >>>> >>>> >>>> >>>> ------------------------------**-------------------- >>>> Dr. Hilary April Smith >>>> Postdoctoral Research Associate >>>> University of Notre Dame >>>> Department of Biological Sciences >>>> 321 Galvin Life Sciences >>>> >>>> >>>> >>>> >>>> >>>> On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: >>>> >>>> Dear Hilary, >>>>> >>>>> Makes sense to me. >>>>> >>>>> Personally, I would analyse all the libraries together (with 8 groups) >>>>> instead of using separate glmFits for the two ages. The same contrasts >>>>> will still work. Then there is no issue with different numbers of rows >>>>> etc. >>>>> >>>>> Best wishes >>>>> Gordon >>>>> >>>>> On Sat, 11 May 2013, Hilary Smith wrote: >>>>> >>>>> Dear Gordon, >>>>>> Thank you. We cannot really remove the 3-way interaction, because >>>>>> there >>>>>> are some genes that respond to the 3-way interaction (even though a >>>>>> classic parameterization with the intercept, leads to about an order >>>>>> of >>>>>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>>>>> >>>>>> Testing for species*treatment at each time, definitely seems the >>>>>> closest way to address the questions we have. Roughly following section >>>>>> 3.3.1 of the edgeR user guide ("Defining each treatment combination as a >>>>>> group") on the creation of specific contrasts (and many thanks for your >>>>>> updated User Guides to show this formulation in detail), I set up tests as >>>>>> below. If this is valid, that's great, as it does make much more intuitive >>>>>> sense to me (and biological in terms of addressing our questions of >>>>>> interest). >>>>>> >>>>>> Thank you for your help; I REALLY appreciate it! >>>>>> Best, >>>>>> Hilary >>>>>> >>>>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>>>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>>>>>> >>>>>> >>>>>> myC_TimeI = makeContrasts( >>>>>>> >>>>>> + Treatment1_vs_Treatment2_**SpeciesA_TimeI = >>>>>> SpeciesA_TimeI_Treatment2 - >>>>>> SpeciesA_TimeI_Treatment1, >>>>>> + Treatment1_vs_Treatment2_**SpeciesB_TimeI = >>>>>> SpeciesB_TimeI_Treatment2 - >>>>>> SpeciesB_TimeI_Treatment1, >>>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>>>>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>>>>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>>>>> + levels=design1) >>>>>> >>>>>>> >>>>>>> myC_TimeII = makeContrasts( >>>>>>> >>>>>> + Treatment1_vs_Treatment2_**SpeciesA_TimeII = >>>>>> SpeciesA_TimeII_Treatment2 >>>>>> - >>>>>> SpeciesA_TimeII_Treatment1, >>>>>> + Treatment1_vs_Treatment2_**SpeciesB_TimeII = >>>>>> SpeciesB_TimeII_Treatment2 >>>>>> - >>>>>> SpeciesB_TimeII_Treatment1, >>>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>>>>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>>>>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>>>>> + levels=design2) >>>>>> >>>>>> Then, after using glmFit on each of the two makeContrasts above, I ran >>>>>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>>>>> Treatment1_vs_Treatment2_**SpeciesA_TimeI, >>>>>> Treatment1_vs_Treatment2_**SpeciesB_TimeI, etc. through >>>>>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>>>>> >>>>>> Ultimately I may then try to use the VennCounts/Diagram from limma to >>>>>> show the overlap between the 4 sets visually: Treatment1vsTreatment2_ >>>>>> **SpeciesA_TimeI; Treatment1 vsTreatment2_SpeciesA_TimeII; >>>>>> Treatment1 vsTreatment2_SpeciesB_TimeI; and Treatment1 >>>>>> vsTreatment2_SpeciesB_TimeII. However, this may take a bit to set up, >>>>>> because my two time points have slightly different gene lists (i.e., to run >>>>>> the analyses on the Times differently, after filtering out reads to only >>>>>> retain those with cpm>1 in 4 libraries, TimesI and II did not retain >>>>>> exactly the same genes, though it¹s rather close). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------**-------------------- >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth@wehi.edu.au> wrote: >>>>>> >>>>>> Dear Hilary, >>>>>>> >>>>>>> Your test for the 3-way interaction is correct, although 3-way >>>>>>> interactions are pretty hard to interpret. >>>>>>> >>>>>>> However testing for the 2-way interaction in the presence of a 3-way >>>>>>> interaction does not make statistical sense. This is because the >>>>>>> parametrization of the 2-way interaction as a subset of the 3-way is >>>>>>> somewhat arbitrary. Before you can test the 2-way interaction >>>>>>> species*treatment in a meaningful way you would need to accept that the >>>>>>> 3-way interaction is not necessary and remove it from the model. >>>>>>> >>>>>>> In general, I am of the opinion that classical statistical factorial >>>>>>> interation models do not usually provide the most meaningful >>>>>>> parametrizations for genomic experiments. In most cases, I prefer to fit >>>>>>> the saturated model (a different level for each treatment combination) and >>>>>>> make specific contrasts. There is some discussion of this in the limma >>>>>>> User's Guide. >>>>>>> >>>>>>> In your case, I guess that you might want to test for >>>>>>> species*treatment interaction separately at each time point. It is almost >>>>>>> impossible to do this within the classical 3-way factorial setup. However >>>>>>> it is easy with the one-way approach I just mentioned, or else you could >>>>>>> use: >>>>>>> >>>>>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>>>>> >>>>>>> Best wishes >>>>>>> Gordon >>>>>>> >>>>>>> >>>>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>>>>> From: Hilary Smith <hilary.a.smith.964@nd.edu> >>>>>>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org> >>>>>>>> Subject: [BioC] Statistics question for multi-factor interaction >>>>>>>> test >>>>>>>> in edgeR >>>>>>>> >>>>>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>>>>> RNA-Seq count data. I have 3 factors with 2 levels apiece (2 species X 2 >>>>>>>> treatments X 2 times), and 4 separate replicates each (i.e., we made a >>>>>>>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the >>>>>>>> interaction of species*treatment, as we think species A will alter gene >>>>>>>> expression in the treatment stress vs. treatment benign, whereas species B >>>>>>>> is expected to show little change. However, we¹d like to also do another >>>>>>>> test of species*treatment*time, because it is possible that the ability of >>>>>>>> species A to alter gene expression in response to the stress treatment may >>>>>>>> differ at the 1st versus 2nd time point. >>>>>>>> >>>>>>>> I think the way to set this up, is to create a design matrix as >>>>>>>> follows, with the lrt test with coef 5 giving the differentially expressed >>>>>>>> genes for the species*treatment test, and coef 8 giving the the >>>>>>>> differentially expressed gene for the species*treatment*time test (after >>>>>>>> calling topTags that is). Yet to ensure I have the statistics correct, my >>>>>>>> questions are: (1) is this thinking correct, as I don¹t see many 3x2 >>>>>>>> factorial models to follow, and (2) do I need to set up a reference somehow >>>>>>>> (which I assume would be the set of four samples with >>>>>>>> TreatmentBenign*SpeciesB***Time2, but I¹m not fully sure if that >>>>>>>> is correct or needed). >>>>>>>> >>>>>>>> Many thanks in advance for your insight! >>>>>>>> ~Hilary >>>>>>>> >>>>>>>> designFF <- model.matrix(~Treatment***Species*Age) >>>>>>>>> colnames(designFF) >>>>>>>>> >>>>>>>> [1] "(Intercept)" >>>>>>>> [2] " TreatmentStress" >>>>>>>> [3] "SpeciesA " >>>>>>>> [4] "Time1" >>>>>>>> [5] "TreatmentStress:SpeciesA" >>>>>>>> [6] "TreatmentStress:Time1" >>>>>>>> [7] "SpeciesA:Time1" >>>>>>>> [8] "TreatmentStress:SpeciesA:**Time1" >>>>>>>> >>>>>>>> And then to run tests with: >>>>>>>> >>>>>>>>> fit <- glmFit(y, designFF) >>>>>>>>> >>>>>>>> >>>>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>>>>> lrtInteractionStressSpeciesTim**e <- glmLRT(fitFF, coef=8) >>>>>>>>> >>>>>>>> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intend...{{dropped:5}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Tim Triche ★ 4.2k

Entering edit mode

Thank you very much. I looked online, and between that and what you note below this makes more sense to me now: the total number of false discoveries scales with more tests, such that for an FDR at 0.05, there could be one contrast with 100 tests (genes), where there are 5 false discoveries out of 100 tests (genes), and a second contrast with say 50 out of 1000 tests (genes). Then if one combines/analyzes both contrasts, there are 55 false discoveries out of 1100 tests, leading to an FDR = 55/1100, which still is 0.05. This is excellent (esp. as after talking with my supervisor, we may want to add yet another contrast or two by testing for the time*treatment interaction within each species (in addition to the prior tests of treatment*species and main effects within each time point, albeit the total will probably only be ~8 contrasts). I definitely will look through the limma user guide as well, as you suggested, for more information. Thank you again; I really appreciate your insight as I definitely (for the study, and simply to understand/curiosity) want to know and apply the correct analysis. Best, Hilary -------------------------------------------------- Dr. Hilary April Smith Postdoctoral Research Associate University of Notre Dame Department of Biological Sciences 321 Galvin Life Sciences On 5/13/13 8:02 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >Hi Hilary, > >Just to make it clear what I mean by scalability: Suppose that you >successfully control the FDR to be less than 0.05 for each contrast >separately. Then it is an automatic consequence that the FDR for all the >tests pooled together is also less than 0.05. > >Best wishes >Gordon > >On Tue, 14 May 2013, Gordon K Smyth wrote: > >> Dear Hilary, >> >> Unlike p-values (Type I error control), FDR is a scalable loss >>function, so >> it is not generally necessary to decrease the cutoff when adding more >>tests. >> It is not correct or necessary to divide FDR by the number of tests >>done (a >> la Bonferroni p-values). >> >> The limma package (decideTests function) has a number of strategies for >> combining FDR across multiple contrasts as well as multiple genes. >>However >> the most commmon practice is the default, which is to simply do the FDR >> adjustment separately for each contrast. For the limited number of >>contrasts >> you are considering, I would be happy enough with this strategy. >> >> Best wishes >> Gordon >> >> --------------------------------------------- >> Professor Gordon K Smyth, >> Bioinformatics Division, >> Walter and Eliza Hall Institute of Medical Research, >> 1G Royal Parade, Parkville, Vic 3052, Australia. >> http://www.statsci.org/smyth >> >> On Mon, 13 May 2013, Hilary Smith wrote: >> >>> One further question. I'm glad to hear it's acceptable to run the >>> cpm>1)>=4 filtering, as well as the glmFit(Y, design), once on all 8 >>> groups (32 libraries), and then just use the makeContrasts function. >>>Yet >>> if I use makeContrasts to specify the 6 tests as noted in the prior >>> email/post, should I lower my "significant" FDR cutoff from 0.05 to >>> (0.05/6) = 0.0083? I know edgeR does a BH correction / FDR to account >>>for >>> the multiple testing of all the different genes or tags. However, I am >>> unsure whether it's necessary to also have a correction for the 6 >>> different contrasts I am testing, or if simply making the model >>>>design = >>> model.matrix(~0+Group, data=Y$samples) where Y = DGEList(counts=D, >>> group=Group), and Group contains all 8 groups (2 species * 2 >>>treatments * >>> 2 times), intrinsically accounts for this in how it sets up the model. >>>I >>> assume makeContrasts or the design function accounts for this, but I >>>just >>> want to be sure. >>> >>> Thank you again, >>> Hilary >>> >>> >>> >>> -------------------------------------------------- >>> Dr. Hilary April Smith >>> Postdoctoral Research Associate >>> University of Notre Dame >>> Department of Biological Sciences >>> 321 Galvin Life Sciences >>> >>> >>> >>> >>> >>> On 5/11/13 11:53 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>> >>>> Dear Hilary, >>>> >>>> Makes sense to me. >>>> >>>> Personally, I would analyse all the libraries together (with 8 groups) >>>> instead of using separate glmFits for the two ages. The same >>>>contrasts >>>> will still work. Then there is no issue with different numbers of >>>>rows >>>> etc. >>>> >>>> Best wishes >>>> Gordon >>>> >>>> On Sat, 11 May 2013, Hilary Smith wrote: >>>> >>>>> Dear Gordon, >>>>> Thank you. We cannot really remove the 3-way interaction, because >>>>>there >>>>> are some genes that respond to the 3-way interaction (even though a >>>>> classic parameterization with the intercept, leads to about an order >>>>>of >>>>> magnitude more genes responding to a 2-way vs. 3-way interaction). >>>>> >>>>> Testing for species*treatment at each time, definitely seems the >>>>> closest way to address the questions we have. Roughly following >>>>> section 3.3.1 of the edgeR user guide ("Defining each treatment >>>>> combination as a group") on the creation of specific contrasts (and >>>>> many thanks for your updated User Guides to show this formulation in >>>>> detail), I set up tests as below. If this is valid, that's great, as >>>>> it does make much more intuitive sense to me (and biological in >>>>>terms >>>>> of addressing our questions of interest). >>>>> >>>>> Thank you for your help; I REALLY appreciate it! >>>>> Best, >>>>> Hilary >>>>> >>>>>> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples) >>>>>> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples) >>>>> >>>>>> myC_TimeI = makeContrasts( >>>>> + Treatment1_vs_Treatment2_SpeciesA_TimeI = >>>>>SpeciesA_TimeI_Treatment2 - >>>>> SpeciesA_TimeI_Treatment1, >>>>> + Treatment1_vs_Treatment2_SpeciesB_TimeI = >>>>>SpeciesB_TimeI_Treatment2 - >>>>> SpeciesB_TimeI_Treatment1, >>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI = >>>>> (SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-( >>>>> SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1), >>>>> + levels=design1) >>>>>> >>>>>> myC_TimeII = makeContrasts( >>>>> + Treatment1_vs_Treatment2_SpeciesA_TimeII = >>>>>SpeciesA_TimeII_Treatment2 >>>>> - >>>>> SpeciesA_TimeII_Treatment1, >>>>> + Treatment1_vs_Treatment2_SpeciesB_TimeII = >>>>>SpeciesB_TimeII_Treatment2 >>>>> - >>>>> SpeciesB_TimeII_Treatment1, >>>>> + Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII = >>>>> (SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-( >>>>> SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1), >>>>> + levels=design2) >>>>> >>>>> Then, after using glmFit on each of the two makeContrasts above, I >>>>>ran >>>>> glmLRT 6 times, once on each of the 6 contrasts (i.e., on >>>>> Treatment1_vs_Treatment2_SpeciesA_TimeI, >>>>> Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through >>>>> Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI. >>>>> >>>>> Ultimately I may then try to use the VennCounts/Diagram from limma >>>>>to >>>>> show the overlap between the 4 sets visually: >>>>> Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1 >>>>> vsTreatment2_SpeciesA_TimeII; Treatment1 >>>>>vsTreatment2_SpeciesB_TimeI; >>>>> and Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take >>>>>a >>>>> bit to set up, because my two time points have slightly different >>>>> gene lists (i.e., to run the analyses on the Times differently, >>>>>after >>>>> filtering out reads to only retain those with cpm>1 in ?4 >>>>> libraries, TimesI and II did not retain exactly the same genes, >>>>> though it?s rather close). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------------------------------------------------- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at="" wehi.edu.au=""> wrote: >>>>> >>>>>> Dear Hilary, >>>>>> >>>>>> Your test for the 3-way interaction is correct, although 3-way >>>>>> interactions are pretty hard to interpret. >>>>>> >>>>>> However testing for the 2-way interaction in the presence of a >>>>>>3-way >>>>>> interaction does not make statistical sense. This is because the >>>>>> parametrization of the 2-way interaction as a subset of the 3-way >>>>>>is >>>>>> somewhat arbitrary. Before you can test the 2-way interaction >>>>>> species*treatment in a meaningful way you would need to accept that >>>>>> the 3-way interaction is not necessary and remove it from the model. >>>>>> >>>>>> In general, I am of the opinion that classical statistical >>>>>>factorial >>>>>> interation models do not usually provide the most meaningful >>>>>> parametrizations for genomic experiments. In most cases, I prefer >>>>>> to fit the saturated model (a different level for each treatment >>>>>> combination) and make specific contrasts. There is some discussion >>>>>> of this in the limma User's Guide. >>>>>> >>>>>> In your case, I guess that you might want to test for >>>>>> species*treatment interaction separately at each time point. It is >>>>>> almost impossible to do this within the classical 3-way factorial >>>>>> setup. However it is easy with the one-way approach I just >>>>>> mentioned, or else you could use: >>>>>> >>>>>> ~Age + Age:Species + Age:Treatment + Age:Species:Treatment >>>>>> >>>>>> Best wishes >>>>>> Gordon >>>>>> >>>>>> >>>>>>> Date: Thu, 9 May 2013 14:55:46 -0400 >>>>>>> From: Hilary Smith <hilary.a.smith.964 at="" nd.edu=""> >>>>>>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >>>>>>> Subject: [BioC] Statistics question for multi-factor interaction >>>>>>>test >>>>>>> in edgeR >>>>>>> >>>>>>> Hi. I need to generate two GLM tests of a factorial design with >>>>>>> RNA-Seq count data. I have 3 factors with 2 levels apiece (2 >>>>>>> species X 2 treatments X 2 times), and 4 separate replicates each >>>>>>> (i.e., we made a total of 2*2*2*4 = 32 separate libraries). Our >>>>>>> main interest is in the interaction of species*treatment, as we >>>>>>> think species A will alter gene expression in the treatment stress >>>>>>> vs. treatment benign, whereas species B is expected to show little >>>>>>> change. However, we?d like to also do another test of >>>>>>> species*treatment*time, because it is possible that the ability of >>>>>>> species A to alter gene expression in response to the stress >>>>>>> treatment may differ at the 1st versus 2nd time point. >>>>>>> >>>>>>> I think the way to set this up, is to create a design matrix as >>>>>>> follows, with the lrt test with coef 5 giving the differentially >>>>>>> expressed genes for the species*treatment test, and coef 8 giving >>>>>>> the the differentially expressed gene for the >>>>>>> species*treatment*time test (after calling topTags that is). Yet >>>>>>>to >>>>>>> ensure I have the statistics correct, my questions are: (1) is >>>>>>>this >>>>>>> thinking correct, as I don?t see many 3x2 factorial models to >>>>>>> follow, and (2) do I need to set up a reference somehow (which I >>>>>>> assume would be the set of four samples with >>>>>>> TreatmentBenign*SpeciesB*Time2, but I?m not fully sure if that is >>>>>>> correct or needed). >>>>>>> >>>>>>> Many thanks in advance for your insight! >>>>>>> ~Hilary >>>>>>> >>>>>>>> designFF <- model.matrix(~Treatment*Species*Age) >>>>>>>> colnames(designFF) >>>>>>> [1] "(Intercept)" >>>>>>> [2] " TreatmentStress" >>>>>>> [3] "SpeciesA " >>>>>>> [4] "Time1" >>>>>>> [5] "TreatmentStress:SpeciesA" >>>>>>> [6] "TreatmentStress:Time1" >>>>>>> [7] "SpeciesA:Time1" >>>>>>> [8] "TreatmentStress:SpeciesA:Time1" >>>>>>> >>>>>>> And then to run tests with: >>>>>>>> fit <- glmFit(y, designFF) >>>>>>> >>>>>>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5) >>>>>>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8) > >_____________________________________________________________________ _ >The information in this email is confidential and intended solely for the >addressee. >You must not disclose, forward, print or use it without the permission of >the sender. >_____________________________________________________________________ _

ADD REPLY • link 11.5 years ago Hilary Smith ▴ 70