Seeking Advice on HTA 2.0 Microarray Data Processing with Oligo Package
1
0
Entering edit mode
Yao Lipu • 0
@19873cbb
Last seen 1 day ago
United States

Dear Bioconductor Community,

I am currently working with HTA 2.0 microarray data and using the oligo package for RMA normalization. I have encountered a few questions and would appreciate any guidance.The platform I use is GPL17586.

(1) After performing RMA normalization with oligo::rma(), I noticed that the extracted matrix contains multiple types of probe IDs. I would like to conduct differential expression analysis using gene symbols. Is it appropriate to directly convert the probe IDs to gene symbols, or is there a risk of introducing errors in the process?

enter image description here

Probetype: 2924323_st , TC10002874.hg.1,there are two kinds.

(2)The HTA 2.0 microarray includes both "gene" and "exon" level data, but I am only interested in gene-level expression. How can I properly distinguish and extract gene-level information while ensuring the integrity of my analysis?

hta20transcriptcluster.db org.Sc.sgd.db HTA2.0 MicroRNAArrayData • 280 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

As an example

> library(GEOquery)
> library(oligo)
> library(limma)
> library(affycoretools)
## you need this package to annotate
> library(hta20transcriptcluster.db)
## some example data
> getGEOSuppFiles("GSE54143")
> setwd("GSE54143/")
> untar("GSE54143_RAW.tar")
> dat <- read.celfiles(dir(".", "gz$"))
> eset <- rma(dat)
## this function is in my affycoretools package
> eset <- annotateEset(eset, hta20transcriptcluster.db)
> head(fData(eset))
              PROBEID ENTREZID
2824546_st 2824546_st     <NA>
2824549_st 2824549_st     <NA>
2824551_st 2824551_st     <NA>
2824554_st 2824554_st     <NA>
2827992_st 2827992_st     <NA>
2827995_st 2827995_st     <NA>
           SYMBOL GENENAME
2824546_st   <NA>     <NA>
2824549_st   <NA>     <NA>
2824551_st   <NA>     <NA>
2824554_st   <NA>     <NA>
2827992_st   <NA>     <NA>
2827995_st   <NA>     <NA>
> tab <- table(fData(eset)$SYMBOL)
> table(tab)
tab
    1     2     3     4     5     6 
19246  4122   284    47    27    33 
    7     8     9    10    11    12 
   62    42     7     5     3     9 
   13    14    15    16    18    22 
   10     5     5     4     1     1 
> tab[tab > 14L]

       BTNL2        DHX16 
          16           16 
      DUX4L2        HCG23 
          16           15 
LOC100507547         LST1 
          15           15 
         LTB    POLR1HASP 
          22           15 
   PSMB8-AS1          TNF 
          15           16 
        TNXB 
          18 
## remove unannotated stuff
> eset2 <- eset[!is.na(fData(eset)$SYMBOL),]
## average duplicates
> avg <- avereps(exprs(eset2), ID = fData(eset2)$SYMBOL)
> head(avg[,1])
             GSM1308838_1.CEL.gz
DDX11L1                 5.220588
OR4F5                   1.977769
LINC01001               8.980209
LINC01061               9.255836
OR4F29                  1.773047
LOC101928626            2.862830

> any(duplicated(rownames(avg)))
[1] FALSE
0
Entering edit mode

Thank you for your help! Your explanation was very clear and really helped me solve my issue. I truly appreciate your time and effort!

ADD REPLY

Login before adding your answer.

Traffic: 556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6