Need help annotating Affymetrix 1.1ST microarrays
2
0
Entering edit mode
mat149 ▴ 80
@mat149-11450
Last seen 11 hours ago
United States

Hello,

I am inquiring about how to obtain ENTREZ/Ensembl/Gene symbol annotations for RMA normalized transcript cluster ID's generated from a cohort of Affymetrix's Zebrafish 1.1ST gene array strips // microarrays.  I have the data frame made, however the library I am using to map gene identifiers to probes does not seem to contain the annotation data. Additionally, when I write the table out, the column headers are shifted one unit to the left of where they should be; I am not sure how to fix it (have only been using "R" for ~3 months).  Is anyone willing to provide insight as to how I can obtain ENTREZ ID's/gene symbols/etc for this type of microarray (the project goal is revolves around GO/KEGG enrichment)? Thanks in advance,

-Matthew

 

 


Scripts:

setwd("C:\\Users\\mat149\\Desktop\\CEL_liu")
library(oligo)

Liu_list = list.celfiles("C:\\Users\\mat149\\Desktop\\CEL_liu",full.names=TRUE)
Liu_data = read.celfiles(filenames = Liu_list,experimentData=TRUE)

ph = Liu_data@phenoData
ph@data
ph@data[ ,1] = c("Control1","Control2","Control3","Control4","pcd17_MO1","pcd17_MO2","pcd17_MO3","pcd17_MO4")
ph@data[ ,2] = c("control","control","control","control","morphant","morphant","morphant","morphant")
colnames(ph@data)[2]="source"
colnames(ph@data)[1]="sample"
groups = ph@data$source
f = factor(groups,levels=c("control","morphant"))

eset = rma(Liu_data)
data.matrix = exprs(eset)
write.exprs(eset,file="pcd17_RMA.txt")
my_frame <- data.frame(exprs(eset))

library(zebrafish.db)
Annotated_frame <- data.frame(SYMBOL=sapply(contents(zebrafishSYMBOL), paste, collapse=","),DESC=sapply(contents(zebrafishGENENAME), paste, collapse=","),ENTREZID=sapply(contents(zebrafishENTREZID), paste, collapse=","),ENSEMBLID=sapply(contents(zebrafishENSEMBL), paste, collapse=","))
merged <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
write.table(merged,file="annotated_RMA_data.txt",sep="\t",[,1])

 

results in this:

Row.names SYMBOL DESC ENTREZID ENSEMBLID control1.cel control2.cel control3.cel control4.cel p17_MO1.cel p17_MO2.cel p17_MO3.cel p17_MO4.cel
1 12916001 NA NA NA NA 6.740321 6.498941 5.297617 6.144367 6.623846 6.615331 6.728196 6.324374
2 12916003 NA NA NA NA 4.601893 4.078181 4.033448 4.41123 4.984537 4.86664 5.099081 4.518134
3 12916005 NA NA NA NA 6.258967 6.826439 6.082199 5.07959 6.466049 5.701788 6.176568 5.318805
oligo annotation affymetrix microarrays zebrafish • 1.6k views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 12 hours ago
EMBL Heidelberg

It looks like your merge() step is probably not doing what you'd like, presumably because the row names you get in your expression set are not a subset of what you produce in the annotation data.frame.

My advice would be too look at a few from each and make sure you know what type of ID is being used for each.  You can look at the first 10 for example with:

rownames(my_frame)[1:10]
rownames(Annotated_frame)[1:10]

and you could do a quick sanity check for whether any of the IDs in one are present in the other with:

table( rownames(my_frame) %in% rownames(Annotated_frame) )

To answer your other question, by default write.table() will include row names in the output, but doesn't give them a header, which is why your table looks like everything is shifted.  You can stop that happening by by specifying row.names = FALSE e.g.

write.table(merged, file = "annotated_RMA_data.txt", sep="\t", row.names = FALSE)
ADD COMMENT
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 3 days ago
Wageningen University, Wageningen, the …

Since you processed your data using oligo, the Affymetrix-based annotation info should in principle be available in your normalized data object (through the corresponding PdInfo package pd.zebgene.1.1.st). To enable easy annotation of your results, please have a look at the library affycoretools; check specifically the function annotateEset(). See also the remarks of James (author of affycoretools) here A: matching probes to genes from pd.porgene.1.1.st array. Note: since May 2016 the release version will do, no need for the development version anymore, as was mentioned in that post.

Also, FYI: in the Bioconductor Channel at the Faculty of 1000 (F1000) a very nice complete workflow has been published on the analysis of Affymetrix arrays. Although not all of the workflow may be of direct relevance to you, it is an excellent read, especially if your a new to the whole process. Find it here!

ADD COMMENT

Login before adding your answer.

Traffic: 558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6