Entering edit mode
Hello,
I've been working with hugene20sttranscriptcluster.db_2.14.0 (most
recent release version) for the last couple of days, and noticed that
some of our usual marker genes appear to not be present in the
annotation package. These genes are present in current and previous
versions of the Affymetrix probe -> gene mappings from NETAFFX.
For example, transcript cluster 16966809 should correspond to gene
symbol PDGFRA and Entrez ID 5156 (which is included in the NA34
annotation release for the platform) but
any(mappedkeys(hugene20sttranscriptclusterSYMBOL) == "16966809") turns
up FALSE. Picking a random transcript cluster, 16748695 (PDE6H), turns
up TRUE and will return the symbol. I'm not sure if there are other
genes missing as well, since I happened to stumble across this one.
For now I can try to build an annotation database from the affy
annotation. Am I missing something or can someone else confirm that
things are missing?
Quick copy-paste example:
library(hugene20sttranscriptcluster.db, annotate)
any(mappedkeys(hugene20sttranscriptclusterSYMBOL) == "16966809")
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] splines parallel stats graphics grDevices utils
datasets methods base
other attached packages:
[1] hugene20sttranscriptcluster.db_2.14.0 OrderedList_1.36.0
twilight_1.40.0 BiocInstaller_1.14.2
[5] doParallel_1.0.8 iterators_1.0.7
limma_3.20.4 gplots_2.13.0
[9] xlsx_0.5.5 xlsxjars_0.6.0
rJava_0.9-6 annotate_1.42.0
[13] SCAN.UPC_2.6.0 sva_3.10.0
mgcv_1.7-29 nlme_3.1-117
[17] corpcor_1.6.6 foreach_1.4.2
affyio_1.32.0 affy_1.42.2
[21] GEOquery_2.30.0 oligo_1.28.2
Biostrings_2.32.0 XVector_0.4.0
[25] IRanges_1.22.7 oligoClasses_1.26.0
org.Hs.eg.db_2.14.0 RSQLite_0.11.4
[29] DBI_0.2-7 AnnotationDbi_1.26.0
GenomeInfoDb_1.0.2 Biobase_2.24.0
[33] BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] affxparser_1.36.0 bit_1.1-12 bitops_1.0-6
caTools_1.17 codetools_0.2-8 ff_2.2-13
gdata_2.13.3
[8] GenomicRanges_1.16.3 grid_3.1.0 gtools_3.4.0
KernSmooth_2.23-12 lattice_0.20-29 MASS_7.3-31
Matrix_1.1-3
[15] preprocessCore_1.26.1 RCurl_1.95-4.1 stats4_3.1.0
tools_3.1.0 XML_3.98-1.1 xtable_1.7-3
zlibbioc_1.10.0
Adam Cornwell
Programmer/Analyst
[[alternative HTML version deleted]]
Hi James,
I'm currently working on hugene20sttranscriptcluster.db as well. I was wondering if you know the order of the symbol such asĀ "PDGFRA" goes before "FIP1L1" means anything? or it's just random? I'm trying to incorporate probes that can map to multiple genes as well; therefore I was wondering if you would recommend to just use the first symbol for each probe or collapse all the possible symbol for a probe. However, it will be more troublesome if I took the later approach when merging all the probe to single gene level.
Best,
Sylvia