Question

Seeking Advice on HTA 2.0 Microarray Data Processing with Oligo Package

0

Entering edit mode

Yao Lipu • 0

@19873cbb

Last seen 5 weeks ago

United States

Dear Bioconductor Community,

I am currently working with HTA 2.0 microarray data and using the oligo package for RMA normalization. I have encountered a few questions and would appreciate any guidance.The platform I use is GPL17586.

(1) After performing RMA normalization with oligo::rma(), I noticed that the extracted matrix contains multiple types of probe IDs. I would like to conduct differential expression analysis using gene symbols. Is it appropriate to directly convert the probe IDs to gene symbols, or is there a risk of introducing errors in the process?

enter image description here

Probetype: 2924323_st , TC10002874.hg.1,there are two kinds.

(2)The HTA 2.0 microarray includes both "gene" and "exon" level data, but I am only interested in gene-level expression. How can I properly distinguish and extract gene-level information while ensuring the integrity of my analysis?

hta20transcriptcluster.db org.Sc.sgd.db HTA2.0 MicroRNAArrayData • 365 views

ADD COMMENT • link 6 weeks ago • updated 5 weeks ago Yao Lipu • 0

score 0 · Answer 1 · 2025-03-06

As an example

> library(GEOquery)
> library(oligo)
> library(limma)
> library(affycoretools)
## you need this package to annotate
> library(hta20transcriptcluster.db)
## some example data
> getGEOSuppFiles("GSE54143")
> setwd("GSE54143/")
> untar("GSE54143_RAW.tar")
> dat <- read.celfiles(dir(".", "gz$"))
> eset <- rma(dat)
## this function is in my affycoretools package
> eset <- annotateEset(eset, hta20transcriptcluster.db)
> head(fData(eset))
              PROBEID ENTREZID
2824546_st 2824546_st     <NA>
2824549_st 2824549_st     <NA>
2824551_st 2824551_st     <NA>
2824554_st 2824554_st     <NA>
2827992_st 2827992_st     <NA>
2827995_st 2827995_st     <NA>
           SYMBOL GENENAME
2824546_st   <NA>     <NA>
2824549_st   <NA>     <NA>
2824551_st   <NA>     <NA>
2824554_st   <NA>     <NA>
2827992_st   <NA>     <NA>
2827995_st   <NA>     <NA>
> tab <- table(fData(eset)$SYMBOL)
> table(tab)
tab
    1     2     3     4     5     6 
19246  4122   284    47    27    33 
    7     8     9    10    11    12 
   62    42     7     5     3     9 
   13    14    15    16    18    22 
   10     5     5     4     1     1 
> tab[tab > 14L]

       BTNL2        DHX16 
          16           16 
      DUX4L2        HCG23 
          16           15 
LOC100507547         LST1 
          15           15 
         LTB    POLR1HASP 
          22           15 
   PSMB8-AS1          TNF 
          15           16 
        TNXB 
          18 
## remove unannotated stuff
> eset2 <- eset[!is.na(fData(eset)$SYMBOL),]
## average duplicates
> avg <- avereps(exprs(eset2), ID = fData(eset2)$SYMBOL)
> head(avg[,1])
             GSM1308838_1.CEL.gz
DDX11L1                 5.220588
OR4F5                   1.977769
LINC01001               8.980209
LINC01061               9.255836
OR4F29                  1.773047
LOC101928626            2.862830

> any(duplicated(rownames(avg)))
[1] FALSE