Using Oligo/limma for analysis of the Mouse Gene 2.0 ST array, I noticed a few un annotated results that lack any gene annotation, upon inspecting some of these IDs on Netaffx they turned out to be the "exon" probesetids not the transcript_ids. I noticed that for some genes both the Transcript_cluster_id and the individual probeset_ids (exon?) were present in my output.
ID |
symbol |
name |
Ensembl |
adj.P |
logFC |
17219448 |
Copa |
coatomer... |
ENSMUSG00000026553 |
0.2239127307 |
0.1570203949 |
17219450 |
NA |
NA |
NA |
0.3382192058 |
0.148945971 |
17219451 |
NA |
NA |
NA |
0.2129264336 |
0.2089252306 |
17219452 |
NA |
NA |
NA |
0.3294590243 |
0.1762376171 |
17219453 |
NA |
NA |
NA |
0.0730023042 |
0.2049750825 |
17219454 |
NA |
NA |
NA |
0.4965835277 |
0.0907620885 |
Looking into pd.mogene.2.0.st it does appear that this happens a times, where the transcript_cluster_id and the individual exon id are both in the "core" set. Is this an affymetrix annotation issue/feature and does anyone have a suggestion on what to do.
con <- db(pd.mogene.2.0.st) head(dbGetQuery(con, "select * from pmfeature inner join + core_mps on pmfeature.fsetid=core_mps.fsetid where + core_mps.meta_fsetid='17219448';")) fid fsetid atom x y meta_fsetid transcript_cluster_id fsetid 1 216932 17219449 16055 923 134 17219448 17219448 17219449 2 1015902 17219450 16056 341 630 17219448 17219448 17219450 3 2307453 17219451 16057 680 1431 17219448 17219448 17219451 4 164045 17219452 16058 1232 101 17219448 17219448 17219452 5 1692866 17219453 16059 265 1050 17219448 17219448 17219453 6 811295 17219454 16060 458 503 17219448 17219448 17219454
For each fsetid of meta_fesetid 17219448 each of the fsetid (which I presume is an exon probeset) also appears as a core_mps
dbGetQuery(con, "select * from pmfeature inner join + core_mps on pmfeature.fsetid=core_mps.fsetid where + core_mps.meta_fsetid='17219449';") fid fsetid atom x y meta_fsetid transcript_cluster_id fsetid 1 216932 17219449 16055 923 134 17219449 17219449 17219449
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] pd.mogene.2.0.st_2.14.1 oligo_1.30.0 Biobase_2.26.0 oligoClasses_1.28.0 RSQLite_1.0.0 [6] DBI_0.3.1 Biostrings_2.34.1 XVector_0.6.0 IRanges_2.0.1 S4Vectors_0.4.0 [11] BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] affxparser_1.38.0 affyio_1.34.0 BiocInstaller_1.16.1 bit_1.1-12 codetools_0.2-9 ff_2.2-13 [7] foreach_1.4.2 GenomeInfoDb_1.2.3 GenomicRanges_1.18.3 iterators_1.0.7 preprocessCore_1.28.0 splines_3.1.2 [13] tools_3.1.2 zlibbioc_1.12.0 |
|
Jim thanks for the insight, looks like a I can do a little cleanup on my own now that I know what to look for. Yes the new arrays are very fun!