What is the difference in annotation between the bioconductor packages pd.hugene.2.1.st and hugene21sttranscriptcluster.db?
I have affymetrix data (hugene 2.1.st (sense target) 16-Array Plate). Before annotation, I performed RMA-preprocessing, which includes RMA-background correction + quantile-normalization + summarization (target = core) using the bioconductor package oligo (bioconductor version 3.7.). When using the bioconductor package pd.hugene.2.1.st, 38598 out of 53617 features have a valid geneassignment. With hugene21sttranscriptcluster.db, an Entrez-ID can be assigned to 29224 out of the 53617 features, thereof 1810 features do not have a valid geneassignment when annotating with pd.hugene.2.1.st.
How does this discrepancy occur?
Which package would you recommend for annotation?
Is it possible to get the Entrez-ID from the variable „geneassignment“ of the pd.hugene.2.1.st package?
(See R-Code below)
Thanks in advance!
Best regards,
Irene
R-Code:
library(oligo) library(pd.hugene.2.1.st) library(hugene21sttranscriptcluster.db) #data import rawData<-read.celfiles(celFiles) dim(exprs(rawData)) #[1] 1416100 16 #rma-preprocessing ppData<-rma(rawData, target ="core") #Background correcting #Normalizing #Calculating Expression dim(ppData) #Features Samples # 53617 16 #annotation with the package pd.hugene.2.1.st annotation_pd<-getNetAffx(ppData,"transcript") table(is.na(annotation_pd$geneassignment)) #FALSE TRUE #38598 15019 #annotation with the package hugene21sttranscriptcluster.db x<-hugene21sttranscriptclusterENTREZID #get the probe identifiers that are mapped to an Entrez Gene ID mapped_probes<-mappedkeys(x) #Convert to a list xx<-as.list(x[mapped_probes]) table(row.names(exprs(ppData)) %in% names(xx)) #FALSE TRUE #24393 29224 table(row.names(exprs(ppData)) %in% names(xx) & is.na(annotation_pd$geneassignment)) #FALSE TRUE #51807 1810