Question

affycoretools annotateEset problem using Clariom D arrays

0

Entering edit mode

willj ▴ 30

@willj-8763

Last seen 7.7 years ago

France

I'm coming up against an annotation mismatch error for probesets, when using annotateEset in the affycoretools package, and after having run rma using the oligo package. The following commands work fine:

> rma.genes <- oligo::rma(rawData, target="core")
Background correcting
Normalizing
Calculating Expression
> rma.genes <- annotateEset(rma.genes, annotation(rma.genes), type='core')
> featureData(rma.genes)
An object of class 'AnnotatedDataFrame'
  rowNames: AFFX-BkGr-GC03_st AFFX-BkGr-GC04_st ... TSUnmapped00001002.hg.1 (138745 total)
  varLabels: PROBEID ID SYMBOL GENENAME
  varMetadata: labelDescription

But the following gives a mismatch error, as shown, and the featureData remains empty:

> rma.probesets <- oligo::rma(rawData, target="probeset")
Background correcting
Normalizing
Calculating Expression

> rma.probesets <- annotateEset(rma.probesets, annotation(rma.probesets), type='probeset')
Error: There appears to be a mismatch between the ExpressionSet and the annotation data.
Please ensure that the summarization level for the ExpressionSet and the 'type' argument are the same.
See ?annotateEset for more information on the type argument.

> featureData(rma.probesets)
An object of class 'AnnotatedDataFrame': none

Am I right that this should work? I think I'm correctly following advice given here Alternate expression of splice isoforms on Affy Clariom D assay (also some here https://support.bioconductor.org/p/93272/)

These are Clariom D arrays:

> rawData <- read.celfiles(celFiles)
Loading required package: pd.clariom.d.human

Many many thanks for any help,

Will

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.clariom.d.human_3.14.1 DBI_0.5-1                 RSQLite_1.1-2             affycoretools_1.46.5     
 [5] oligo_1.38.0              Biostrings_2.42.1         XVector_0.14.0            IRanges_2.8.1            
 [9] S4Vectors_0.12.1          Biobase_2.34.0            oligoClasses_1.36.0       BiocGenerics_0.20.0      

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2              hwriter_1.3.2                 biovizBase_1.22.0            
  [4] htmlTable_1.8                 GenomicRanges_1.26.2          base64enc_0.1-3              
  [7] dichromat_2.0-0               affyio_1.44.0                 interactiveDisplayBase_1.12.0
 [10] AnnotationDbi_1.36.2          codetools_0.2-15              splines_3.3.2                
 [13] R.methodsS3_1.7.1             ggbio_1.22.4                  geneplotter_1.52.0           
 [16] knitr_1.15.1                  Formula_1.2-1                 Rsamtools_1.26.1             
 [19] annotate_1.52.1               cluster_2.0.5                 GO.db_3.4.0                  
 [22] R.oo_1.21.0                   graph_1.52.0                  shiny_1.0.0                  
 [25] httr_1.2.1                    GOstats_2.40.0                backports_1.0.4              
 [28] assertthat_0.1                Matrix_1.2-7.1                lazyeval_0.2.0               
 [31] limma_3.30.8                  acepack_1.4.1                 htmltools_0.3.5              
 [34] tools_3.3.2                   gtable_0.2.0                  affy_1.52.0                  
 [37] Category_2.40.0               reshape2_1.4.2                affxparser_1.46.0            
 [40] Rcpp_0.12.9                   gdata_2.18.0                  preprocessCore_1.36.0        
 [43] rtracklayer_1.34.1            iterators_1.0.8               stringr_1.1.0                
 [46] mime_0.5                      ensembldb_1.6.2               gtools_3.5.0                 
 [49] XML_3.98-1.5                  AnnotationHub_2.6.4           edgeR_3.16.5                 
 [52] zlibbioc_1.20.0               scales_0.4.1                  BSgenome_1.42.0              
 [55] VariantAnnotation_1.20.2      BiocInstaller_1.24.0          SummarizedExperiment_1.4.0   
 [58] RBGL_1.50.0                   RColorBrewer_1.1-2            yaml_2.1.14                  
 [61] memoise_1.0.0                 gridExtra_2.2.1               ggplot2_2.2.1                
 [64] biomaRt_2.30.0                rpart_4.1-10                  reshape_0.8.6                
 [67] latticeExtra_0.6-28           stringi_1.1.2                 gcrma_2.46.0                 
 [70] genefilter_1.56.0             foreach_1.4.3                 checkmate_1.8.2              
 [73] caTools_1.17.1                GenomicFeatures_1.26.2        BiocParallel_1.8.1           
 [76] GenomeInfoDb_1.10.2           ReportingTools_2.14.0         bitops_1.0-6                 
 [79] lattice_0.20-34               GenomicAlignments_1.10.0      bit_1.1-12                   
 [82] GSEABase_1.36.0               AnnotationForge_1.16.1        GGally_1.3.2                 
 [85] plyr_1.8.4                    magrittr_1.5                  DESeq2_1.14.1                
 [88] R6_2.2.0                      gplots_3.0.1                  Hmisc_4.0-2                  
 [91] foreign_0.8-67                survival_2.40-1               RCurl_1.95-4.8               
 [94] nnet_7.3-12                   tibble_1.2                    KernSmooth_2.23-15           
 [97] OrganismDbi_1.16.0            PFAM.db_3.4.0                 locfit_1.5-9.1               
[100] grid_3.3.2                    data.table_1.10.0             digest_0.6.11                
[103] xtable_1.8-2                  ff_2.2-13                     httpuv_1.3.3                 
[106] R.utils_2.5.0                 munsell_0.4.3

affycoretools clariom • 2.7k views

ADD COMMENT • link updated 7.7 years ago by James W. MacDonald 68k • written 7.7 years ago by willj ▴ 30

score 1 · Answer 1 · 2017-09-06

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 hour ago

United States

When you use annotateEset like that, you are reading in the raw annotation data (from the annotation csv) that comes packaged with the pdInfo package. Apparently that annotation data is borked somehow; the test is for at least 95% overlap between the probeset IDs in the annotation csv and the probeset IDs in the ExpressionSet you are trying to annotate. Unfortunately the overlap is 0%. It appears that the probeset annotation file for this package is actually the transcript annotation file (again), which is why you get the problem you see.

There is an alternative way to annotate your data (which is IMO the 'main' way to do such things), which is to use the ChipDb that we supply.

> library(clariomdhumanprobeset.db)
Loading required package: AnnotationDbi
Loading required package: org.Hs.eg.db

> eset <- annotateEset(eset, clariomdhumanprobeset.db)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> featureData(eset)
An object of class 'AnnotatedDataFrame'
  rowNames: 24561160 24561161 ... rat-RPTR-XXU09476-1_st (1562457
    total)
  varLabels: PROBEID ENTREZID SYMBOL GENENAME
  varMetadata: labelDescription

> apply(fData(eset), 2, function(x) sum(!is.na(x))/length(x))
  PROBEID  ENTREZID    SYMBOL  GENENAME
1.0000000 0.4449582 0.4449582 0.4449582

ADD COMMENT • link 7.7 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks a lot James. By the way: is there some resource or document that would have pointed me to using your ChipDb without having to post a question here? i.e. something I should be keeping up-to-date with for future reference?

ADD REPLY • link 7.6 years ago willj ▴ 30

1

Entering edit mode

You mean other than the help page? Here is the first section:

annotateEset           package:affycoretools           R Documentation

Method to annotate ExpressionSets automatically

Description:

     This function fills the featureData slot of the ExpressionSet
     automatically, which is then available to downstream methods to
     provide annotated output. Annotating results is tedious, and can
     be surprisingly difficult to get right. By annotating the data
     automatically, we remove the tedium and add an extra layer of
     security since the resulting ExpressionSet will be tested for
     validity automatically (e.g., annotation data match up correctly
     with the expression data). Current choices for the annoation data
     are a ChipDb object (e.g., hugene10sttranscriptcluster.db) or an
     AffyGenePDInfo object (e.g., pd.hugene.1.0.st.v1). In the latter
     case, we use the parsed Affymetrix annotation csv file to get
     data. This is only intended for those situations where the ChipDb
     package is not available.

ADD REPLY • link 7.6 years ago James W. MacDonald 68k

0

Entering edit mode

Great, thanks - it should have been obvious to me to check there. I note also for anyone else struggling with this that there is also some explanation of ChipDb here in an AnnotationDbi vignette: https://www.bioconductor.org/packages/devel/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf

ADD REPLY • link 7.6 years ago willj ▴ 30