How to annotate mir4.1 arrays?
1
0
Entering edit mode
@richardallenfriedmanbrooklyn-24118
Last seen 9 weeks ago
United States

Dear list.

I am analyzing an Affymetrix mir 4.1 dataset using the pd.mirna.4.1 file obtained by the instructions in the following post:

Affymetrix miRNA4.1 / oligo package / pd.mirna.4.1

library(devtools)
install_github("soumyabrataghosh/pd.mirna.4.1")

I am getting only probeset ids but not mir names or ENTREZ gene ids. Here is my session

> library(oligo)
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
    mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: oligoClasses
Welcome to oligoClasses version 1.60.0
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

Loading required package: IRanges
Loading required package: XVector
Loading required package: GenomeInfoDb

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

===================================================================================================================
Welcome to oligo version 1.62.2
===================================================================================================================
> library(affycoretools)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

> library(limma)

Attaching package: 'limma'

The following object is masked from 'package:oligo':

    backgroundCorrect

The following object is masked from 'package:BiocGenerics':

    plotMA
> library(pd.mirna.4.1)
Loading required package: RSQLite
Loading required package: DBI
> celfiles  <-  list.celfiles("data",full.names=TRUE)
> raw<-  read.celfiles(celfiles,pkgname="pd.mirna.4.1")
Platform design info loaded.
Reading in : data/a1.ctr.exo.fadu.CEL
.

Reading in : data/e3.tgfb.exo.fadu.CEL
> probeset.eset<-annotateEset(probeset.eset, pd.mirna.4.1, columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"))
Error: There is no annotation object provided with the pd.mirna.4.1 package.

> sessionInfo( )
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.mirna.4.1_0.1     DBI_1.1.3            RSQLite_2.3.1        limma_3.54.2         affycoretools_1.70.0
 [6] oligo_1.62.2         Biostrings_2.66.0    GenomeInfoDb_1.34.9  XVector_0.38.0       IRanges_2.32.0      
[11] S4Vectors_0.36.2     Biobase_2.58.0       oligoClasses_1.60.0  BiocGenerics_0.44.0 

loaded via a namespace (and not attached):
  [1] backports_1.4.1             GOstats_2.64.0              Hmisc_5.0-1                
  [4] BiocFileCache_2.6.1         plyr_1.8.8                  lazyeval_0.2.2             
  [7] GSEABase_1.60.0             splines_4.2.3               BiocParallel_1.32.6        
 [10] ggplot2_3.4.2               digest_0.6.31               foreach_1.5.2              
 [13] ensembldb_2.22.0            htmltools_0.5.5             GO.db_3.16.0               
 [16] fansi_1.0.4                 magrittr_2.0.3              checkmate_2.2.0            
 [19] memoise_2.0.1               BSgenome_1.66.3             cluster_2.1.4              
 [22] gcrma_2.70.0                annotate_1.76.0             matrixStats_0.63.0         
 [25] R.utils_2.12.2              ggbio_1.46.0                prettyunits_1.1.1          
 [28] colorspace_2.1-0            blob_1.2.4                  rappdirs_0.3.3             
 [31] xfun_0.39                   dplyr_1.1.2                 crayon_1.5.2               
 [34] RCurl_1.98-1.12             jsonlite_1.8.4              graph_1.76.0               
 [37] genefilter_1.80.3           survival_3.5-5              VariantAnnotation_1.44.1   
 [40] iterators_1.0.14            glue_1.6.2                  gtable_0.3.3               
 [43] zlibbioc_1.44.0             DelayedArray_0.24.0         Rgraphviz_2.42.0           
 [46] scales_1.2.1                GGally_2.1.2                edgeR_3.40.2               
 [49] Rcpp_1.0.10                 xtable_1.8-4                progress_1.2.2             
 [52] htmlTable_2.4.1             foreign_0.8-84              bit_4.0.5                  
 [55] OrganismDbi_1.40.0          preprocessCore_1.60.2       Formula_1.2-5              
 [58] AnnotationForge_1.40.2      htmlwidgets_1.6.2           httr_1.4.5                 
 [61] gplots_3.1.3                RColorBrewer_1.1-3          ff_4.0.9                   
 [64] R.methodsS3_1.8.2           pkgconfig_2.0.3             reshape_0.8.9              
 [67] XML_3.99-0.14               nnet_7.3-19                 dbplyr_2.3.2               
 [70] locfit_1.5-9.7              utf8_1.2.3                  tidyselect_1.2.0           
 [73] rlang_1.1.1                 reshape2_1.4.4              AnnotationDbi_1.60.2       
 [76] munsell_0.5.0               tools_4.2.3                 cachem_1.0.8               
 [79] cli_3.6.1                   generics_0.1.3              evaluate_0.20              
 [82] stringr_1.5.0               fastmap_1.1.1               yaml_2.3.7                 
 [85] knitr_1.42                  bit64_4.0.5                 caTools_1.18.2             
 [88] KEGGREST_1.38.0             AnnotationFilter_1.22.0     RBGL_1.74.0                
 [91] R.oo_1.25.0                 xml2_1.3.4                  biomaRt_2.54.1             
 [94] compiler_4.2.3              rstudioapi_0.14             filelock_1.0.2             
 [97] curl_5.0.0                  png_0.1-8                   affyio_1.68.0              
[100] PFAM.db_3.16.0              tibble_3.2.1                geneplotter_1.76.0         
[103] stringi_1.7.12              Glimma_2.8.0                GenomicFeatures_1.50.4     
[106] lattice_0.21-8              ProtGenerics_1.30.0         Matrix_1.5-4               
[109] vctrs_0.6.2                 pillar_1.9.0                lifecycle_1.0.3            
[112] BiocManager_1.30.20         data.table_1.14.8           bitops_1.0-7               
[115] rtracklayer_1.58.0          GenomicRanges_1.50.2        affy_1.76.0                
[118] hwriter_1.3.2.1             R6_2.5.1                    BiocIO_1.8.0               
[121] KernSmooth_2.23-21          gridExtra_2.3               affxparser_1.70.0          
[124] codetools_0.2-19            dichromat_2.0-0.1           gtools_3.9.4               
[127] SummarizedExperiment_1.28.0 DESeq2_1.38.3               Category_2.64.0            
[130] rjson_0.2.21                ReportingTools_2.38.0       GenomicAlignments_1.34.1   
[133] Rsamtools_2.14.0            GenomeInfoDbData_1.2.9      parallel_4.2.3             
[136] hms_1.1.3                   grid_4.2.3                  rpart_4.1.19               
[139] rmarkdown_2.21              MatrixGenerics_1.10.0       biovizBase_1.46.0          
[142] base64enc_0.1-3             restfulr_0.0.15            
>

How do I get the miR symbols and ENTREZ GENEIDS corrsponding to the probe ids?

Thanks and best wishes,

Richard Friedman.

Columbia University Cancer Center

AffymetrixChip miRNA oligo • 972 views
ADD COMMENT
0
Entering edit mode

Dear List,

I ended up reading in the annotation csv file from Affy and subsetting it, and merging it with the toptable file from limma.

Best wishes,

Rich

1
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You haven't yet run rma on your data, so you cannot annotate the data yet. Once you have run rma, you can annotate using the csv file from ThermoFisher. (you will need a login for this).

> eset <- rma(raw)
## note that you need to specify coment.char!
> anno <- read.csv(("TFS-Assets_LSG_Support-Files_miRNA-4_1-st-v1-annotations-20160922-csv/miRNA-4_1-st-v1.annotations.20160922.csv", comment.char = "#")
> anno <- anno[,2:4]
> eset <- annotateEset(eset, anno, 1, 2:3)
## et voila!

As an aside, this is all documented in the help page for annotateEset

Usage:

     annotateEset(object, x, ...)

     ## S4 method for signature 'ExpressionSet,ChipDb'
     annotateEset(
       object,
       x,
       columns = c("PROBEID", "ENTREZID", "SYMBOL", "GENENAME"),
       multivals = "first"
     )

     ## S4 method for signature 'ExpressionSet,AffyGenePDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyHTAPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyExonPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,AffyExpressionPDInfo'
     annotateEset(object, x, type = "core", ...)

     ## S4 method for signature 'ExpressionSet,character'
     annotateEset(object, x, ...)

     ## S4 method for signature 'ExpressionSet,data.frame'
     annotateEset(object, x, probecol = NULL, annocols = NULL, ...) <------------- This part here

Arguments:

  object: An ExpressionSet to which we want to add annotation.

       x: Either a ChipDb package (e.g.,
          hugene10sttranscriptcluster.db), or a pdInfoPackage object
          (e.g., pd.hugene.1.0.st.v1).

     ...: Allow users to pass in arbitrary arguments. Particularly
          useful for passing in columns, multivals, and type arguments
          for methods.

 columns: For ChipDb method; what annotation data to add. Use the
          'columns' function to see what choices you have. By default
          we get the ENTREZID, SYMBOL and GENENAME.

multivals: For ChipDb method; this is passed to 'mapIds' to control how
          1:many mappings are handled. The default is 'first', which
          takes just the first result. Other valid values are 'list'
          and 'CharacterList', which return all mapped results.

    type: For pdInfoPackages; either 'core' or 'probeset',
          corresponding to the 'target' argument used in the call to
          'rma'.

probecol: Column of the data.frame that contains the probeset IDs. Can <---------------- As well as this entry and the following one
          be either numeric (the column number) or character (the
          column header).

annocols: Column(x) of the data.frame to use for annotating. Can be a
          vector of numbers (which column numbers to use) or a
          character vector (vector of column names).
ADD COMMENT
0
Entering edit mode

A useful thing to do is to include the species as well.

> anno <- read.csv("TFS-Assets_LSG_Support-Files_miRNA-4_1-st-v1-annotations-20160922-csv/miRNA-4_1-st-v1.annotations.20160922.csv", comment.char = "#")
> eset <- annotateEset(eset, anno, 2, c(3,4,6))
> head(fData(eset))
            Accession Transcript.ID.Array.Design. Species.Scientific.Name
14q0_st          14q0                        14q0            Homo sapiens
14qI-1_st      14qI-1                      14qI-1            Homo sapiens
14qI-1_x_st    14qI-1                      14qI-1            Homo sapiens
14qI-2_st      14qI-2                      14qI-2            Homo sapiens
14qI-3_x_st    14qI-3                      14qI-3            Homo sapiens
14qI-4_st      14qI-4                      14qI-4            Homo sapiens

## assuming you care only about Homo sapiens
> esetsmall <- eset[fData(eset)[,3] %in% "Homo sapiens",]
> esetsmall
ExpressionSet (storageMode: lockedEnvironment)
assayData: 6631 features, 4 samples 
  element names: exprs 
protocolData
  rowNames: GSM4509143_CAMC_V_Replicate1.CEL.gz
    GSM4509144_CAMC_V_Replicate2.CEL.gz
    GSM4509145_CAMC_C_Replicate1.CEL.gz
    GSM4509146_CAMC_C_Replicate2.CEL.gz
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM4509143_CAMC_V_Replicate1.CEL.gz
    GSM4509144_CAMC_V_Replicate2.CEL.gz
    GSM4509145_CAMC_C_Replicate1.CEL.gz
    GSM4509146_CAMC_C_Replicate2.CEL.gz
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames: 14q0_st 14qI-1_st ... Z17B_st (6631 total)
  fvarLabels: Accession Transcript.ID.Array.Design.
    Species.Scientific.Name
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.mirna.4.0
ADD REPLY
0
Entering edit mode

Jim,

I just saw this.

Thanks as always,

Rich

Login before adding your answer.

Traffic: 597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6