Dear all,
I have some gene expression data sets. In some of the data sets, I am comparing two age categories, and in some I am comparing multiple age categories. For each dataset, I want to do a two-tailed F test between the age categories to check if the slope of the curve is different from zero, which would indicate an association between expression signal and age.
I've been trying to understand the method here: https://www.biostars.org/p/222041/. Dr. Devon Ryan is so kindly trying to help me. I'm really struggling and I just don't understand specifically where in the code I am meant to change; I didn't want to keep bothering Dr. Ryan, so I emailed Prof. Smyth who said to ask here. The problem is I can get the code to run, but I want to make sure I am using the software correctly.
If someone had the time to glance at the code below and edit the line that I should change, so that the code reads: I input the pData (different age categories; sometimes two, sometimes more than 2 age categories) and gene expression data, and the output is a two-tailed F test with adjusted F test P values that tells me if the slope of the curve differs from 0, indicating a relationship between expression and age.I'm also not sure if I can use the same script for two age categories and more than two age categories.
The code:
The gene expression data (Table.tabbed) looks same for both 2-age, and more-than-2-age cases:
Gene Sample1 Sample2 Sample3 Sample4 Sample5 Sample6…. Gene1 5.2 5.05 5.01 4.96 5.31 4.834 Gene2 4.85 4.59 4.91 5.06 4.75 4.86 Gene3 7.68 7.30 7.58 7.64 7.44 7.69
and the pData file looks like this (Two-Age case):
Sample Age Sample1 Age2.5 Sample2 Age2.5 Sample3 Age2.5 Sample4 Age20 Sample5 Age20 Sample6 Age20
pData file: (More than 2 age category case):
Sample Age Sample1 3.0 Sample2 3.0 Sample3 10.0 Sample4 10.0 Sample5 50.0 Sample6 50.0
###########
This is the code that I use for both the two-age, and more-than-two-age category data sets:
library(limma) library(Biobase)
#Make an expression data set
expr <-as.matrix(read.table("Table",header=TRUE,sep="\t",row.names=1,as.is=TRUE))minimalSet <-ExpressionSet(assayData=expr) pData <-read.table("pData",row.names=1,header=TRUE,sep="\t") metadata <-data.frame(labelDescription=c("Age"),row.names=c("Age")) phenoData <-new("AnnotatedDataFrame",data=pData,varMetadata=metadata) exampleSet <-ExpressionSet(assayData=expr,phenoData=phenoData,annotation="Mouse430_2")
#Do limma
f <-factor(as.character(exampleSet$Age)) design <-model.matrix(~f) #I think this line is the first problem fit = eBayes(lmFit(exampleSet,design)) topTable(fit) #And I think this line is the second problem
############
The output (2-age case):
Removing intercept from test coefficients logFC AveExpr t P.Value adj.P.Val Gene1 -19.8 13.2 -19.6 9.3e-09 3.3e-05 Gene2 18.6 14.7 13.8 2.0e-07 3.6e-04 Gene3 13.0 12.4 11.0 1.4e-06 1.6e-03
The output (more than 2 age case):
Removing intercept from test coefficients fAge12.0 fAge23.0 fAge6.0 fAge9.0 AveExpr F PVal AdjPVal Gene1 5.3 10.08 2.4 3.2 2.0 5.7 3.5e-17 1.3e-13 Gene2 4.2 1.4 3.03 2.4 2.3 4.3 1.e-16 1.3e-12 Gene3 1.8 3.6 7.9 9.9 8.2 3.6 1.e-12 1.5e-11
If someone could tell me specifically what I should write/how I should edit the code so that I input the pData (different age categories; sometimes two, sometimes more than 2 age categories) and gene expression data, and the output is a two-tailed F test with adjusted F test P values that tells me if the slope of the curve differs from 0, indicating a relationship between expression and age. I have to stress, I can get the code to run, and I can get it to run if I modify certain things, it's just wanting to make sure that the code i'm running is actually doing what I want it to do. I appreciate it, thank you.
Thank you so much. Can I ask, regarding the F test, you can see from my above code that for the two-age test, I get a T test p value and for the more-than-two age category, I get a two-tailed F test p value. I understand that the T and F test are related; I'm wondering if it's possible to obtain a two-tailed F test score/p value from a two-age-category data set?
If you square the t-statistic for a two-tailed t-test, you'll get a F-statistic for the corresponding F-test. They'll give the same p-values, but the t-statistic is more direct and easier to interpret. Note that the F-statistic is not directional in this setting, so it doesn't make sense to talk about a "two-tailed" F-test. While the F-test will pick up changes between groups in any direction, the tails refer to the number of rejection regions and we're only interested in extreme F-statistics, i.e., the upper tail.