Question

Need help annotating Affymetrix 1.1ST microarrays

0

Entering edit mode

mat149 ▴ 80

@mat149-11450

Last seen 4 months ago

United States

Hello,

I am inquiring about how to obtain ENTREZ/Ensembl/Gene symbol annotations for RMA normalized transcript cluster ID's generated from a cohort of Affymetrix's Zebrafish 1.1ST gene array strips // microarrays. I have the data frame made, however the library I am using to map gene identifiers to probes does not seem to contain the annotation data. Additionally, when I write the table out, the column headers are shifted one unit to the left of where they should be; I am not sure how to fix it (have only been using "R" for ~3 months). Is anyone willing to provide insight as to how I can obtain ENTREZ ID's/gene symbols/etc for this type of microarray (the project goal is revolves around GO/KEGG enrichment)? Thanks in advance,

-Matthew

Scripts:

setwd("C:\\Users\\mat149\\Desktop\\CEL_liu")
library(oligo)

Liu_list = list.celfiles("C:\\Users\\mat149\\Desktop\\CEL_liu",full.names=TRUE)
Liu_data = read.celfiles(filenames = Liu_list,experimentData=TRUE)

ph = Liu_data@phenoData
ph@data
ph@data[ ,1] = c("Control1","Control2","Control3","Control4","pcd17_MO1","pcd17_MO2","pcd17_MO3","pcd17_MO4")
ph@data[ ,2] = c("control","control","control","control","morphant","morphant","morphant","morphant")
colnames(ph@data)[2]="source"
colnames(ph@data)[1]="sample"
groups = ph@data$source
f = factor(groups,levels=c("control","morphant"))

eset = rma(Liu_data)
data.matrix = exprs(eset)
write.exprs(eset,file="pcd17_RMA.txt")
my_frame <- data.frame(exprs(eset))

library(zebrafish.db)
Annotated_frame <- data.frame(SYMBOL=sapply(contents(zebrafishSYMBOL), paste, collapse=","),DESC=sapply(contents(zebrafishGENENAME), paste, collapse=","),ENTREZID=sapply(contents(zebrafishENTREZID), paste, collapse=","),ENSEMBLID=sapply(contents(zebrafishENSEMBL), paste, collapse=","))
merged <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
write.table(merged,file="annotated_RMA_data.txt",sep="\t",[,1])

results in this:

Row.names	SYMBOL	DESC	ENTREZID	ENSEMBLID	control1.cel	control2.cel	control3.cel	control4.cel	p17_MO1.cel	p17_MO2.cel	p17_MO3.cel	p17_MO4.cel
1	12916001	NA	NA	NA	NA	6.740321	6.498941	5.297617	6.144367	6.623846	6.615331	6.728196	6.324374
2	12916003	NA	NA	NA	NA	4.601893	4.078181	4.033448	4.41123	4.984537	4.86664	5.099081	4.518134
3	12916005	NA	NA	NA	NA	6.258967	6.826439	6.082199	5.07959	6.466049	5.701788	6.176568	5.318805

oligo annotation affymetrix microarrays zebrafish • 1.7k views

ADD COMMENT • link updated 8.6 years ago by Guido Hooiveld ★ 4.1k • written 8.6 years ago by mat149 ▴ 80

score 0 · Answer 1 · 2016-09-12

It looks like your merge() step is probably not doing what you'd like, presumably because the row names you get in your expression set are not a subset of what you produce in the annotation data.frame.

My advice would be too look at a few from each and make sure you know what type of ID is being used for each. You can look at the first 10 for example with:

rownames(my_frame)[1:10]
rownames(Annotated_frame)[1:10]

and you could do a quick sanity check for whether any of the IDs in one are present in the other with:

table( rownames(my_frame) %in% rownames(Annotated_frame) )

To answer your other question, by default write.table() will include row names in the output, but doesn't give them a header, which is why your table looks like everything is shifted. You can stop that happening by by specifying row.names = FALSE e.g.

write.table(merged, file = "annotated_RMA_data.txt", sep="\t", row.names = FALSE)

score 0 · Answer 2 · 2016-09-12

Since you processed your data using oligo, the Affymetrix-based annotation info should in principle be available in your normalized data object (through the corresponding PdInfo package pd.zebgene.1.1.st). To enable easy annotation of your results, please have a look at the library affycoretools; check specifically the function annotateEset(). See also the remarks of James (author of affycoretools) here A: matching probes to genes from pd.porgene.1.1.st array. Note: since May 2016 the release version will do, no need for the development version anymore, as was mentioned in that post.

Also, FYI: in the Bioconductor Channel at the Faculty of 1000 (F1000) a very nice complete workflow has been published on the analysis of Affymetrix arrays. Although not all of the workflow may be of direct relevance to you, it is an excellent read, especially if your a new to the whole process. Find it here!