Dear all,
I am quite new to Bioconductor and have tried to process data from Affymetrix GeneChip Rat Gene 2.0 ST Array. This is my first time investigating data from this array. I have a couple of questions. I would appreciate any advice in advance.
1) I have realized "affy" package does not work for data from this array, and found "oligo" package does work. My normalizing processes are the followings:
library(oligo)
CEL_list <- list.celfiles("./data", full.names=TRUE)
celfiles <- read.celfiles(CEL_list)
celfiles.rma <- rma(celfiles, target="core")
celfiles.rma
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 36685 features, 10 samples
.......
# Use getMainProbes to remove control probesets from ST arrays
library(affycoretools)
celfiles.main <- getMainProbes(celfiles.rma)
celfiles.main
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 30472 features, 10 samples
.......
I guess I removed about 6000 control probsets. Is this right way?
2) I used "ragene20sttranscriptcluster.db" for the Genechip Rat Gene 2.0 ST Array. Is this right?
3) My annotation processes with "ragene20sttranscriptcluster.db" are the followings:
library(ragene20sttranscriptcluster.db)
eset.main <- exprs(celfiles.main)
dim(eset.main)
## [1] 30472 10
affyid <- rownames(eset.main)
egids2 <- ragene20sttranscriptclusterENTREZID[affyid]
annots <- toTable(egids2)
str(annots)
## 'data.frame': 15058 obs. of 2 variables:
## $ probe_id: chr "17610314" "17610335" "17610349" "17610410" ...
## $ gene_id : chr "100910373" "502213" "308257" "308265" ...
nrow(annots) ####
## [1] 15058
eset.main.affyid <- eset.main[annots$probe_id, ]
dim(eset.main.affyid)
## [1] 15058 10
At this stage, I have deleted more than 50% of the original data. Is this the right way? Or am I wrong?
Thank you very much for your help.
Kohkichi
You might try the SCAN function in the SCAN.UPC package. It's designed to handle this type of array (uses the oligo package behind the scenes).