Analyzing gene expression data( not celfiles) in limma and annotating results
1
0
Entering edit mode
@kaushal-raj-chaudhary-6165
Last seen 9.5 years ago
United States

I have gene expression data (not image files) obtained from affymetrix chip and I used limma to analyze it.  The row names of top table does not match the probe id. I saw this thread for annotating results from celfiles (https://support.bioconductor.org/p/63834/).

 I wonder how can I annotate the results of toptable().  

Thanks for any insights and help.

 

Code:


dat<-read.csv("Rat_Gene_Expression_Array.csv")   ### importing data 
names(dat)
Heart_dat<-dat[c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
             "HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1")]  ## subsetting data 


library(data.table)        ### load the library data.table 
setnames(Heart_dat, old=c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
                          "HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1"),
                    new=c("Probe_ID","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND",
                          "HF.DM","HF.DM","HF.DM","HF.DM","HF.DM","Gene.name"))   ## renaming columns
names(Heart_dat)
head(Heart_dat$Gene.name)

test<-Heart_dat[!(Heart_dat$Gene.name=='---'),]   ## removing probe ids which does not have gene name 
dim(test)
test<-test[,c(-1,-15) ]         ### removing first and last column in  data
dim(test)

### now fitting linear model 

library(limma)

Treat<-c("CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","HF.DM","HF.DM","HF.DM","HF.DM","HF.DM")
f<-factor(Treat, levels=c("CD.ND","HF.DM"))
design<-model.matrix(~f)
fit<-lmFit(test, design)
fit
fit2<-eBayes(fit)
fit2
tt<-topTable(fit2, coef=2, adjust="BH")


> tt
           logFC   AveExpr         t      P.Value adj.P.Val         B
14352   ...........................................................................
3210     ...........................................................................
10120  ............................................................................

limma annotation • 1.5k views
ADD COMMENT
2
Entering edit mode
@steve-lianoglou-2771
Last seen 20 months ago
United States

You want test to be a numeric matrix and set its rownames to be the ids/names you want.

You have the following:

test<-test[,c(-1,-15)]

Where it seems the first column and column 15 are the probe ID and genenames, respectively.

Instead of that, what you might want to do is something like:

e <- as.matrix(test[, c(-1,-15)]
rownames(e) <- test[[1]] ## set rownames to probeIDs
stopifnot(is.numeric(e)) ## just making sure we have legit data here
## ... stuff ...
fit <- lmFit(e, design)
fit2 <- eBayes(fit)
tt <- topTable(fit2, coef=2, adjust="BH")

Also, you didn't say what types of numbers are in the (now called) e matrix, but make sure these data have been log2 transformed before you shoot it into lmFit

 

 

ADD COMMENT
0
Entering edit mode

Hi Steve,

Thank you for help. Yes, expression values are log2 transformed.  Now I suppose I can merge annotation table and  results from toptable().

ADD REPLY
2
Entering edit mode

For me the preferred way to annotate data is to put the annotation information into the 'genes' list item of your MArrayLM object, so topTable() will automagically output the annotation for you.

You say above that this is an Affy rat chip of some sort, so let's say for the sake of argument that it is a Rat Gene 1.0 ST array. In addition, let's assume that whomever summarized these data did so at the transcript level.

library(ragene10sttranscriptcluster.db)

gns <- select(ragene10sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","GENENAME"))

you will get a warning that there are some one-to-many mappings, which you have to deal with. Here I just assume you are a dolt like me, and just want to do the most naive thing possible.

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)

ADD REPLY
0
Entering edit mode

Thanks, Jim.

ADD REPLY
0
Entering edit mode

Hi Jim,

How can I remove duplicate probe ids before fitting in lmfit ( not in MArrayLM object)? Your code removes the duplicates only after fitting the model.   

Thanks !!!

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)
ADD REPLY

Login before adding your answer.

Traffic: 642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6