Hey,
I am pretty sure that the PrimeView does not have an official annotation package in Bioconductor - it is certainly not the most commonly used of the Affymetrix chips.
You could retrieve the current NetAffx annotation CSV from Affymetrix's web-site ( PrimeView™ Human Gene Expression Array Plate - Support Materials ), though, and build your own package:
1, create the SQlite database and build the new package
require(AnnotationForge)
makeDBPackage('HUMANCHIP_DB',
affy = TRUE,
prefix = 'primeview',
fileName = 'PrimeView.na36.annot.csv',
baseMapType = 'eg',
outputDir = '.',
author = 'Bioconductor',
version = '0.99.1',
manufacturer = 'Affymetrix',
manufacturerUrl = 'http://www.affymetrix.com')
2, install and load it
install.packages('primeview.db', repos = NULL, type = 'source')
require(primeview.db)
3, perform various lookup operations
probes <- c('11715262_at', '1715107_s_at',
'11715112_at', '11715113_x_at', '11715115_s_at',
'11715116_s_at', '11715247_s_at', '11715248_s_at',
'11715258_s_at', '11715269_x_at', '11715264_s_at')
keytypes(primeview.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "MAP" "OMIM"
[16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
[21] "PROBEID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIGENE" "UNIPROT"
mapIds(primeview.db, keys = probes,
column = c('ENSEMBL'), keytype = 'PROBEID')
'select()' returned 1:1 mapping between keys and columns
11715262_at 1715107_s_at 11715112_at 11715113_x_at
"ENSG00000168412" NA "ENSG00000178965" "ENSG00000158483"
11715115_s_at 11715116_s_at 11715247_s_at 11715248_s_at
"ENSG00000278588" "ENSG00000276966" "ENSG00000100150" "ENSG00000124490"
11715258_s_at 11715269_x_at 11715264_s_at
"ENSG00000159335" "ENSG00000189366" "ENSG00000181408"
mapIds(primeview.db, keys = probes,
column = c('SYMBOL'), keytype = 'PROBEID')
'select()' returned 1:1 mapping between keys and columns
11715262_at 1715107_s_at 11715112_at 11715113_x_at 11715115_s_at
"MTNR1A" NA "ERICH3" "FAM86C1" "H2BC10"
11715116_s_at 11715247_s_at 11715248_s_at 11715258_s_at 11715269_x_at
"H4C5" "DEPDC5" "CRISP2" "PTMS" "ALG1L"
11715264_s_at
"UTS2R"
head(select(primeview.db, keys = probes,
columns = c('PROBEID', 'SYMBOL', 'GENENAME', 'ENSEMBL', 'GO')))
PROBEID SYMBOL GENENAME ENSEMBL GO EVIDENCE
1 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0004930 IBA
2 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0005515 IPI
3 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0005886 IBA
4 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0005886 TAS
5 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0005887 IDA
6 11715262_at MTNR1A melatonin receptor 1A ENSG00000168412 GO:0007186 TAS
ONTOLOGY
1 MF
2 MF
3 CC
4 CC
5 CC
6 BP
annotate::getSYMBOL(probes, 'primeview.db')
11715262_at 1715107_s_at 11715112_at 11715113_x_at 11715115_s_at
"MTNR1A" NA "ERICH3" "FAM86C1" "H2BC10"
11715116_s_at 11715247_s_at 11715248_s_at 11715258_s_at 11715269_x_at
"H4C5" "DEPDC5" "CRISP2" "PTMS" "ALG1L"
11715264_s_at
"UTS2R"
There's a lot of useful information here:
AnnotationDbi: Introduction To Bioconductor Annotation Packages
Kevin
This fixed it. Thank you so much.
Some questions. Do the author, version number, manufacturer and manufacture ID have any relevance to the code? By that I mean when creating the SQL if I put something else there would it change it?
Second, what are the lookup operators for? (Or from what I understand is it you are trying the annotation with a small subset of the genes and seeing if it works [basically testing purposes])
Final question. What exactly is the schema. For example, both HumanDB and Humanchip db are listed as available schema. How do you select it?
As the package is just created locally on your computer, I do not think that the values for author, version number, etc. are too important. If anything they are just for your own benefit so that you can keep track of (and version control) your annotation databases. You also need to be wary about the package name, because you would not want it to clash with a pre-existing official Bioconductor or CRAN package.
By showing the different lookup operations, I was merely providing some examples for you in order to see how you can use the information contained in the database. Hope that they helped!
Regarding the schema, I am pretty sure that it just pre-defines a template of, e.g., expected columns in your input data / file, and, ultimately, the columns that will be stored in the SQlite databsae that's created. There are 3 types of DBs that can be created:
Perhaps reading the first few sections here will help:
Hi Kevin, I run your code to create preview.db package:
but when I try to do this:
gives the error:
Do you know what am I doing wrong?
Hello again. Can you show how you ran the first section of code (marked '1')? In this example, 'primeview.db' should be a directory in your current working directory.
With this code, we are essentially creating our own local copy of a new package called 'primeview.db'
Now I did:
It created a package automatically in directory where other anno packages are. I think the package is installed correctly, because
library(primeview.db)
doesn't throw an error. But when I try to read cell files gives the error:I though it could be because the name of the package 'primeview.db' is different what expected
'pd.primeview'
so I set package name when reading cell files:but gave another error:
You are mixing up 2 things: The
primeview.db
package you generated is indeed an annotation package. However,oligo
requires a PlatformDesign (PdInfo) package for this array (to map the probes to probesets), calledpd.primeview
. In principle you should be able to generate such PdInfo package using the librarypdInfoBuilder
. This is what is reported in the error message (2nd block of code in your post).Complicating factor may be that because of multi-mapped probes this may not be possible... At least, that was the status a couple of years ago as you can read in this thread here. Pay special attention to James MacDonald's answer/post here: https://support.bioconductor.org/p/53302/#53318 (this is the last one in the thread).
The issue in the thread you link to had to do with the
makecdfenv
package, which in general doesn't like multi-mapping probes. IIRC, Ben Bolstad added some code to allow that, but the original Affy arrays never had such things, so they ended up getting ignored.The
pdInfoBuilder
package has no such constraints though, as from the Exon series onward Affy has made use of many multi-mapping probes (the miRNA arrays being a hilarious example, given that an miRNA is shorter than the usual 25-mer, and Affy stuck with the multiple probes per probeset idea).So yeah, the OP just needs to generate a
pd.primeview
package usingpdInfoBuilder
.where can we find the primeview .cdf file to use pdInfoBuilder. do you know?
I'd use the
AffyCompatible
package. It's a pain to find things on Fisher's website. You will need to have a username/password for Affy though. If you don't have one, go to netaffx.com and try to get something - It will send you to a login page.Note that it looks like there was a problem:
But maybe there wasn't? I built the
cdfenv
usingmakecdfenv
, and it looks like some of the multiply-mapped probes got dropped bymakecdfenv
but not bypdInfoBuilder
? Note that what I am calling a 'fid' is the field ID for the CEL file. When you read in a CEL file, you get a column of numbers"Where the FID is the row of the read-in data. So for example
Means that the probes for that probeset are in rows 107740, 182874, etc. And if there are multiply-mapping probes, they will show up more than once in the
cdfenv
or thepdInfoPackage
, indicating that we use the same data more than once, for different probesets.As an example
So the 'pdInfoPackage` has some weird duplications, but I think overall either that or the cdf package should be fine.