Microarray analyses. How to know the direction of the expression?
1
0
Entering edit mode
Mary Kindall ▴ 70
@mary-kindall-5600
Last seen 10.4 years ago
Given a design matrix, how can I know that the logFC in the output of ebayesFit is WT/KO or KO/WT > Group [1] KO WT WT WT KO KO Levels: KO WT > design<- model.matrix(~Group) > design (Intercept) GroupWT 1 1 0 2 1 1 3 1 1 4 1 1 5 1 0 6 1 0 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" > fit <-lmFit(myEset,design) > fit<-eBayes(fit) > fit An object of class "MArrayLM" $coefficients (Intercept) GroupWT 1415670_at 10.0 -0.11544 1415671_at 10.9 0.41568 1415672_at 11.6 0.00933 1415673_at 10.4 -0.16966 1415674_a_at 10.5 -0.01095 18937 more rows ... $rank [1] 2 $assign [1] 0 1 $qr $qr (Intercept) GroupWT 1 -2.449 -1.225 2 0.408 -1.225 3 0.408 0.527 4 0.408 0.527 5 0.408 -0.290 6 0.408 -0.290 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" $qraux [1] 1.41 1.53 $pivot [1] 1 2 $tol [1] 1e-07 $rank [1] 2 $df.residual [1] 4 4 4 4 4 18937 more elements ... $sigma 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at 0.0736 0.1407 0.0627 0.3145 0.1100 18937 more elements ... $cov.coefficients (Intercept) GroupWT (Intercept) 0.333 -0.333 GroupWT -0.333 0.667 $stdev.unscaled (Intercept) GroupWT 1415670_at 0.577 0.816 1415671_at 0.577 0.816 1415672_at 0.577 0.816 1415673_at 0.577 0.816 1415674_a_at 0.577 0.816 18937 more rows ... $pivot [1] 1 2 $Amean 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at 9.95 11.12 11.57 10.34 10.53 18937 more elements ... $method [1] "ls" $design (Intercept) GroupWT 1 1 0 2 1 1 3 1 1 4 1 1 5 1 0 6 1 0 attr(,"assign") [1] 0 1 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment" $df.prior [1] 2.41 $s2.prior [1] 0.0378 $var.prior [1] 422.8 49.9 $proportion [1] 0.01 $s2.post 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at 0.0176 0.0266 0.0167 0.0759 0.0218 18937 more elements ... $t (Intercept) GroupWT 1415670_at 130.6 -1.0653 1415671_at 115.9 3.1225 1415672_at 155.0 0.0885 1415673_at 65.5 -0.7540 1415674_a_at 123.6 -0.0909 18937 more rows ... $df.total [1] 6.41 6.41 6.41 6.41 6.41 18937 more elements ... $p.value (Intercept) GroupWT 1415670_at 3.16e-12 0.3252 1415671_at 6.78e-12 0.0188 1415672_at 1.05e-12 0.9322 1415673_at 2.62e-10 0.4776 1415674_a_at 4.49e-12 0.9304 18937 more rows ... $lods (Intercept) GroupWT 1415670_at 16.9 -6.16 1415671_at 16.6 -3.41 1415672_at 17.2 -6.75 1415673_at 14.4 -6.45 1415674_a_at 16.7 -6.75 18937 more rows ... $F [1] 16860 13961 24050 4225 15270 18937 more elements ... $F.p.value [1] 1.17e-12 2.15e-12 3.75e-13 9.89e-11 1.61e-12 18937 more elements ... > x <- topTable(fit,n=nrow(exprs(myEset)),adjust="BH",coef=2) > head(x) logFC AveExpr t P.Value adj.P.Val B 1452666_a_at 5.36 4.93 45.0 2.92e-09 5.54e-05 8.50 1441054_at 4.82 4.67 33.0 2.12e-08 2.01e-04 7.94 1424722_at 4.53 4.75 28.0 5.95e-08 3.62e-04 7.53 1435378_at 3.91 4.59 26.5 8.52e-08 3.62e-04 7.37 1428391_at 4.78 4.97 26.0 9.54e-08 3.62e-04 7.31 1419311_at 4.79 5.94 21.8 2.98e-07 9.42e-04 6.71 Is all these genes here upregulated as compared to WT? -- ------------- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]]
• 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States
Hi Mary, This is another reason to use a 'cell means' parameterization rather than a 'factor effects' parameterization. This is just statistician blahblah for computing a mean for each group rather than a baseline level and a difference between the other sample and the baseline. In your situation, since the second coefficient has WT in the name (and you have an intercept term), the intercept is computing the mean of a baseline sample (in your case the KO sample), and the GroupWT coefficient is WT - KO. You could instead do design <- model.matrix(~0+Group) contrast <- matrix(c(1,-1), ncol = 1) colnames(contrast) <- "KO vs WT" fit <- lmFit(myEset, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) topTable(fit2, 1) Or you could use relevel() Group <- relevel(Group, "WT") and create your design matrix. But IMO it is always easier to just compute group means and make comparisons directly, because then you don't have to figure out how to interpret the coefficients. Best, Jim On Thursday, September 12, 2013 1:13:09 PM, Mary Kindall wrote: > Given a design matrix, how can I know that the logFC in the output of > ebayesFit is WT/KO or KO/WT > > > Group > [1] KO WT WT WT KO KO > Levels: KO WT > > design<- model.matrix(~Group) > > design > (Intercept) GroupWT > 1 1 0 > 2 1 1 > 3 1 1 > 4 1 1 > 5 1 0 > 6 1 0 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > fit <-lmFit(myEset,design) > > fit<-eBayes(fit) > > fit > An object of class "MArrayLM" > $coefficients > (Intercept) GroupWT > 1415670_at 10.0 -0.11544 > 1415671_at 10.9 0.41568 > 1415672_at 11.6 0.00933 > 1415673_at 10.4 -0.16966 > 1415674_a_at 10.5 -0.01095 > 18937 more rows ... > > $rank > [1] 2 > > $assign > [1] 0 1 > > $qr > $qr > (Intercept) GroupWT > 1 -2.449 -1.225 > 2 0.408 -1.225 > 3 0.408 0.527 > 4 0.408 0.527 > 5 0.408 -0.290 > 6 0.408 -0.290 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > $qraux > [1] 1.41 1.53 > > $pivot > [1] 1 2 > > $tol > [1] 1e-07 > > $rank > [1] 2 > > > $df.residual > [1] 4 4 4 4 4 > 18937 more elements ... > > $sigma > 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at > 0.0736 0.1407 0.0627 0.3145 0.1100 > 18937 more elements ... > > $cov.coefficients > (Intercept) GroupWT > (Intercept) 0.333 -0.333 > GroupWT -0.333 0.667 > > $stdev.unscaled > (Intercept) GroupWT > 1415670_at 0.577 0.816 > 1415671_at 0.577 0.816 > 1415672_at 0.577 0.816 > 1415673_at 0.577 0.816 > 1415674_a_at 0.577 0.816 > 18937 more rows ... > > $pivot > [1] 1 2 > > $Amean > 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at > 9.95 11.12 11.57 10.34 10.53 > 18937 more elements ... > > $method > [1] "ls" > > $design > (Intercept) GroupWT > 1 1 0 > 2 1 1 > 3 1 1 > 4 1 1 > 5 1 0 > 6 1 0 > attr(,"assign") > [1] 0 1 > attr(,"contrasts") > attr(,"contrasts")$Group > [1] "contr.treatment" > > > $df.prior > [1] 2.41 > > $s2.prior > [1] 0.0378 > > $var.prior > [1] 422.8 49.9 > > $proportion > [1] 0.01 > > $s2.post > 1415670_at 1415671_at 1415672_at 1415673_at 1415674_a_at > 0.0176 0.0266 0.0167 0.0759 0.0218 > 18937 more elements ... > > $t > (Intercept) GroupWT > 1415670_at 130.6 -1.0653 > 1415671_at 115.9 3.1225 > 1415672_at 155.0 0.0885 > 1415673_at 65.5 -0.7540 > 1415674_a_at 123.6 -0.0909 > 18937 more rows ... > > $df.total > [1] 6.41 6.41 6.41 6.41 6.41 > 18937 more elements ... > > $p.value > (Intercept) GroupWT > 1415670_at 3.16e-12 0.3252 > 1415671_at 6.78e-12 0.0188 > 1415672_at 1.05e-12 0.9322 > 1415673_at 2.62e-10 0.4776 > 1415674_a_at 4.49e-12 0.9304 > 18937 more rows ... > > $lods > (Intercept) GroupWT > 1415670_at 16.9 -6.16 > 1415671_at 16.6 -3.41 > 1415672_at 17.2 -6.75 > 1415673_at 14.4 -6.45 > 1415674_a_at 16.7 -6.75 > 18937 more rows ... > > $F > [1] 16860 13961 24050 4225 15270 > 18937 more elements ... > > $F.p.value > [1] 1.17e-12 2.15e-12 3.75e-13 9.89e-11 1.61e-12 > 18937 more elements ... > > > x <- topTable(fit,n=nrow(exprs(myEset)),adjust="BH",coef=2) > > head(x) > logFC AveExpr t P.Value adj.P.Val B > 1452666_a_at 5.36 4.93 45.0 2.92e-09 5.54e-05 8.50 > 1441054_at 4.82 4.67 33.0 2.12e-08 2.01e-04 7.94 > 1424722_at 4.53 4.75 28.0 5.95e-08 3.62e-04 7.53 > 1435378_at 3.91 4.59 26.5 8.52e-08 3.62e-04 7.37 > 1428391_at 4.78 4.97 26.0 9.54e-08 3.62e-04 7.31 > 1419311_at 4.79 5.94 21.8 2.98e-07 9.42e-04 6.71 > > > > Is all these genes here upregulated as compared to WT? > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Hi, On Thu, Sep 12, 2013 at 11:57 AM, James W. MacDonald <jmacdon at="" uw.edu=""> wrote: > Hi Mary, > > This is another reason to use a 'cell means' parameterization rather than a > 'factor effects' parameterization. This is just statistician blahblah for > computing a mean for each group rather than a baseline level and a > difference between the other sample and the baseline. > > In your situation, since the second coefficient has WT in the name (and you > have an intercept term), the intercept is computing the mean of a baseline > sample (in your case the KO sample), and the GroupWT coefficient is WT - KO. > > You could instead do > > design <- model.matrix(~0+Group) > contrast <- matrix(c(1,-1), ncol = 1) > colnames(contrast) <- "KO vs WT" Perhaps using a even a little less "statistician blahblah", the OP might find using the limma::makeContrasts function more intuitive, eg: R> contrast <- makeContrasts(KO-WT, levels=levels(Group)) And continue as Jim prescribed below: > fit <- lmFit(myEset, design) > fit2 <- contrasts.fit(fit, contrast) > fit2 <- eBayes(fit2) > topTable(fit2, 1) Sincerely, A non-statistician who deals with his fair share of statistical blah blah -- Steve Lianoglou Computational Biologist Bioinformatics and Computational Biology Genentech
ADD REPLY

Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6