Some questions about gene annotation
2
0
Entering edit mode
llkxiaolan ▴ 10
@llkxiaolan-13767
Last seen 6.7 years ago

I use the limma package for the differential gene analysis, and then use the annotate package for the notes, but why some probe annotation results after the display is not the name of the gene, but NA, what does this mean? Are multiple probes corresponding to one gene? 

I am a Chinese student, English is not very good, so the sentence may have a problem, I hope you can understand my expression of the problem .thank you .

 

logFC

AveExpr t P.Value adj.P.Val B symbols

EntrezID 

1416006_at -1.0874 8.773366 -11.1644 1.85E-13 8.35E-09 19.8322 Mdk 17242
1415904_at -0.88263 12.51717 -9.9003 5.45E-12 6.62E-08 16.79884 Lpl 16956
1454865_at -1.18566 9.195649 -9.87341 5.87E-12 6.62E-08 16.73164 Slc9a8 77031
1434456_at -0.63818 5.957798 -9.17706 4.14E-11 3.11E-07 14.9534 Rundc3b 242819
1422230_s_at -0.66954 11.42731 -7.77882 2.51E-09 4.10E-06 11.17045 NA NA
1436101_at -1.25105 9.58079 -7.27419 1.16E-08 1.28E-05 9.743425 NA NA
limma • 1.3k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

Your English is fine, but you should try to give more information in future. If you don't say what package you used to get the data, then others have to try to figure that out, which I am sure is not your intent.

It appears you are using the Affymetrix MOE430a array, and perhaps you are using an old version of R and Bioconductor. You don't show your code, so I can't see how you got those results, but do note that there are multiple ways you can do that, and some are less useful.

There is an old-style way that  you shouldn't really use any more, which is to use mget, or getSYMBOL. If you do that, using the current release versions, you get this:

> getSYMBOL(prbs, "moe430a.db")
  1416006_at   1415904_at   1454865_at   1434456_at 1422230_s_at   1436101_at
       "Mdk"        "Lpl"     "Slc9a8"           NA           NA           NA

You should instead use either select or mapIds:

> mapIds(moe430a.db, prbs, "SYMBOL","PROBEID")
'select()' returned 1:many mapping between keys and columns
  1416006_at   1415904_at   1454865_at   1434456_at 1422230_s_at   1436101_at
       "Mdk"        "Lpl"     "Slc9a8"           NA     "Cyp2a5"           NA

> select(moe430a.db, prbs, "SYMBOL","PROBEID")
'select()' returned 1:many mapping between keys and columns
       PROBEID SYMBOL
1   1416006_at    Mdk
2   1415904_at    Lpl
3   1454865_at Slc9a8
4   1434456_at   <NA>
5 1422230_s_at Cyp2a5
6 1422230_s_at Cyp2a4
7   1436101_at   <NA>

The difference being that by default mapIds will only give you the first mapping for any 1:many mapped probes (1422230_s_at), whereas select gives you all of them. For the old style methods like getSYMBOL, any 1:many mapping probe will return NA by default, which is why you shouldn't use those methods any more.

So one of your probesets returns NA because it's a 1:many mapped probe. But what about the other two?

> select(moe430b.db, prbs, "SYMBOL","PROBEID")
'select()' returned 1:many mapping between keys and columns
       PROBEID  SYMBOL
1   1416006_at    <NA>
2   1415904_at    <NA>
3   1454865_at    <NA>
4   1434456_at Rundc3b
5 1422230_s_at    <NA>
6   1436101_at   Pank2
7   1436101_at   Rnf24

Those probesets are on the MOE430b array (and probably all of these probes are on the HT-MOE430 array, which included both arrays, but I don't think we have an annotation package for that). So it appears that you are just missing annotation for two probesets because they are 1:many and you are using getSYMBOL to annotate. So switching to select or mapIds will fix that.

But do note that there are probesets on most (if not all) Affy arrays that don't measure things that have a gene symbol or Entrez Gene ID! No matter what, those probesets will always return NA because they are measuring speculative content that NCBI hasn't decided is 'real' enough to get an Entrez Gene ID.

 

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply. I think I didn't make it clear when I was describing the problem. I used the annotate annotation package to annotate it. 

Here's the code I'm using. Can you help me see if there's any problem? The R version I use is 3.4.1.Sorry, I don't know how to upload photos, so only the copy and paste code. 

library(affy)
AffyRawData<-ReadAffy(widget=TRUE)
data.class(AffyRawData)

par(mfrow=c(2,3))
image(AffyRawData[,c(1,2,3,4,5,6)])

library(simpleaffy)
Data.qc<-qc(AffyRawData)
plot(Data.qc)library(RColorBrewer)
library(affyPLM)
Pset<-fitPLM(AffyRawData)
colors<-brewer.pal(12,"Set3")
Mbox(Pset,ylim=c(-1,1),col=colors,main="RLE",las=3)
boxplot(Pset,ylim=c(0.95,1.22),col=colors,main="NUSE",las=3)

data.deg<-AffyRNAdeg(AffyRawData)
plotAffyRNAdeg(data.deg,col=colors)
legend("topleft",rownames(pData(AffyRawData)),col=colors,lwd=1,inset=0.05,cex=0.5)

AffyRawDatarma<-rma(AffyRawData)

boxplot(AffyRawDatarma,col=colors,las=3,main="RMA")

hist(AffyRawDatarma,main="RMA",col=colors)
legend("topright",rownames(pData(AffyRawData)),col=colors,lwd=1,inset=0.05,cex=0.5,ncol=3)

library(limma)

eset<-exprs(AffyRawDatarma)

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))
colnames(design) <- c("group1", "group2")
contrast.matrix <- makeContrasts(contrasts="group2-group1",levels=design)
design
fit <- lmFit(eset, design)
fit1<- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit1)
dif<-topTable(fit2,coef="group2-group1",n=nrow(fit2),lfc=log2(1.5))
dif<-dif[dif[,"P.Value"]<0.01,]
head(dif)

library(annotate)
affydb<-annPkgName(AffyRawData@annotation,type="db")
affydb
source("http://Bioconductor.org/biocLite.R") 
biocLite("mouse4302.db")
library(affydb,character.only=TRUE)
dif$symbols<-getSYMBOL(rownames(dif),affydb)
dif$EntrezID<-getEG(rownames(dif),affydb)
head(dif)
write.csv(dif, 'dif.csv')

ADD REPLY
0
Entering edit mode
llkxiaolan ▴ 10
@llkxiaolan-13767
Last seen 6.7 years ago

Thank you very much for your reply. I think I didn't make it clear when I was describing the problem. I used the annotate annotation package to annotate it. 

Here's the code I'm using. Can you help me see if there's any problem? The R version I use is 3.4.1.Sorry, I don't know how to upload photos, so only the copy and paste code. 

library(affy)
AffyRawData<-ReadAffy(widget=TRUE)
data.class(AffyRawData)

par(mfrow=c(2,3))
image(AffyRawData[,c(1,2,3,4,5,6)])

library(simpleaffy)
Data.qc<-qc(AffyRawData)
plot(Data.qc)library(RColorBrewer)
library(affyPLM)
Pset<-fitPLM(AffyRawData)
colors<-brewer.pal(12,"Set3")
Mbox(Pset,ylim=c(-1,1),col=colors,main="RLE",las=3)
boxplot(Pset,ylim=c(0.95,1.22),col=colors,main="NUSE",las=3)

data.deg<-AffyRNAdeg(AffyRawData)
plotAffyRNAdeg(data.deg,col=colors)
legend("topleft",rownames(pData(AffyRawData)),col=colors,lwd=1,inset=0.05,cex=0.5)

AffyRawDatarma<-rma(AffyRawData)

boxplot(AffyRawDatarma,col=colors,las=3,main="RMA")

hist(AffyRawDatarma,main="RMA",col=colors)
legend("topright",rownames(pData(AffyRawData)),col=colors,lwd=1,inset=0.05,cex=0.5,ncol=3)

library(limma)

eset<-exprs(AffyRawDatarma)

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))
colnames(design) <- c("group1", "group2")
contrast.matrix <- makeContrasts(contrasts="group2-group1",levels=design)
design
fit <- lmFit(eset, design)
fit1<- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit1)
dif<-topTable(fit2,coef="group2-group1",n=nrow(fit2),lfc=log2(1.5))
dif<-dif[dif[,"P.Value"]<0.01,]
head(dif)

library(annotate)
affydb<-annPkgName(AffyRawData@annotation,type="db")
affydb
source("http://Bioconductor.org/biocLite.R") 
biocLite("mouse4302.db")
library(affydb,character.only=TRUE)
dif$symbols<-getSYMBOL(rownames(dif),affydb)
dif$EntrezID<-getEG(rownames(dif),affydb)
head(dif)
write.csv(dif, 'dif.csv')

ADD COMMENT

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6