I have gene expression data (not image files) obtained from affymetrix chip and I used limma to analyze it. The row names of top table does not match the probe id. I saw this thread for annotating results from celfiles (https://support.bioconductor.org/p/63834/).
I wonder how can I annotate the results of toptable().
Thanks for any insights and help.
Code:
dat<-read.csv("Rat_Gene_Expression_Array.csv") ### importing data
names(dat)
Heart_dat<-dat[c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
"HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1")] ## subsetting data
library(data.table) ### load the library data.table
setnames(Heart_dat, old=c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
"HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1"),
new=c("Probe_ID","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND",
"HF.DM","HF.DM","HF.DM","HF.DM","HF.DM","Gene.name")) ## renaming columns
names(Heart_dat)
head(Heart_dat$Gene.name)
test<-Heart_dat[!(Heart_dat$Gene.name=='---'),] ## removing probe ids which does not have gene name
dim(test)
test<-test[,c(-1,-15) ] ### removing first and last column in data
dim(test)
### now fitting linear model
library(limma)
Treat<-c("CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","HF.DM","HF.DM","HF.DM","HF.DM","HF.DM")
f<-factor(Treat, levels=c("CD.ND","HF.DM"))
design<-model.matrix(~f)
fit<-lmFit(test, design)
fit
fit2<-eBayes(fit)
fit2
tt<-topTable(fit2, coef=2, adjust="BH")
> tt
logFC AveExpr t P.Value adj.P.Val B
14352 ...........................................................................
3210 ...........................................................................
10120 ............................................................................
Hi Steve,
Thank you for help. Yes, expression values are log2 transformed. Now I suppose I can merge annotation table and results from toptable().
For me the preferred way to annotate data is to put the annotation information into the 'genes' list item of your MArrayLM object, so topTable() will automagically output the annotation for you.
You say above that this is an Affy rat chip of some sort, so let's say for the sake of argument that it is a Rat Gene 1.0 ST array. In addition, let's assume that whomever summarized these data did so at the transcript level.
you will get a warning that there are some one-to-many mappings, which you have to deal with. Here I just assume you are a dolt like me, and just want to do the most naive thing possible.
Thanks, Jim.
Hi Jim,
How can I remove duplicate probe ids before fitting in lmfit ( not in MArrayLM object)? Your code removes the duplicates only after fitting the model.
Thanks !!!