I'm trying to write a function that can summarize intensity matrices from Affymetrix array types that require the affy package alongside those that require the oligo package to read (incidentally, it's not clear to my why two separate packages are required to accomplish this).
The most straightforward way that I can think to do this is to pass the function three arguments: 1) the matrix of non-normalized probe intensities extracted from the ExpressionFeatureSet object using exprs(), 2) a vector of probe IDs (fids?) and 3) a vector of another identifier (e.g., probeset IDs) that correspond, element-for-element to the vector described in 2.
In the process of writing this function, I have noticed that the dimensions of the data.frame I get using `getProbeInfo()` doesn't correspond to the dimensions of the matrix of probe intensities I acquired using `exprs()`.
Why are these dimensions different, and how do I access probe and probeset (fset_id?)-level identifiers for each entry of my matrix of probe intensities?
To instantiate, I downloaded the CEL files for the chicken genome array [here](http://www.affymetrix.com/support/downloads/demo_data/Demo_Data_Chicken.zip).
path2CELfiles2 = "/Users/fishema/Desktop/CEL_files_for_testing/Demo_Data_Chicken//" #This would obviously be different for you.
microarrayProcessing_ls = processCELfiles(path2CELfiles2)
test1_mat = microarrayProcessing_ls$NonNormalizedMatrix
probeset_df = getProbeInfo(microarrayProcessing_ls$MicroarrayObject,field=c('fid','fsetid'),probeType = "pm",target = "probeset")
dim(test1_mat)
[1] 968256 2
dim(probeset_df)
[1] 424097 3
#The processCELfiles function
processCELfiles = function(pathToCELfiles){
#' Read in CEL files from a directory argument, using the oligo package first and then, failing that, trying the affy package instead. Requires the relevent array package to be installed from BioConductor before run. Not sure of a good strategy for installing these if they haven't already been. Also want more sophistocated error handling here, but SO recommends against nested tryCatches(), because they don't work as expected:
#' http://stackoverflow.com/questions/35716394/nested-try-catch-in-r
#'
#' @param pathToCELfiles a string containing the path to the directory containing
#' @return A list with two elements. The first element is the affy or oligo object (GeneFeatureSet). The second is a matrix with rownames and colnames that contains non-normalized intensities for each probe in a given array
#' @examples
#' path2CELfiles1 = "/Users/fishema/Desktop/CEL_files_for_testing/HTA/"
#' test1_ls = convertCELtoNonNormMat(path2CELfiles1)
#' path2CELfiles2 = "/Users/fishema/Desktop/CEL_files_for_testing/Rhesus/"
#' test2_ls = convertCELtoNonNormMat(path2CELfiles2)
#' path2CELfiles3 = "/Users/fishema/Desktop/CEL_files_for_testing/SNP6/"
#' test3_ls = convertCELtoNonNormMat(path2CELfiles3)
#' test3_mat = test3_ls[[2]]
require(oligo)
require(affy)
tryCatch({
eCELs = list.celfiles(pathToCELfiles,full.names=T)
Data_obj = read.celfiles(eCELs)
return_val = list(Data_obj, exprs(Data_obj))
names(return_val) = c("MicroarrayObject", "NonNormalizedMatrix")
}, warning = function(war) {
print(paste("Warning in convertCELtoNonNormMat function: ",war))
return_val = NULL
#return(return_val)
}, error = function(err) {
# Might be affy then
print(paste("Array type not suitable for oligoClasses package. Trying affy package: ",err))
Data_obj=ReadAffy(celfile.path=pathToCELfiles)
non_norm_mat = exprs(Data_obj)
return_val = list(Data_obj, non_norm_mat)
names(return_val) = c("MicroarrayObject", "NonNormalizedMatrix")
#return(return_val)
}, finally = {
print(paste("Done with convertCELtoNonNormMat"))
return(return_val)
}) # END tryCatch
} #END processCELfiles
Thanks so much, @james!
Do you think it makes sense to apply summarize() to just those probes that A) have probes in the then and B) are PM probes?
I am not sure what you mean by
summarize
in this context. But if you were to read in using oligo and then dorma
, you would only use the PM probes, and only those positions on the array that have probes tiled down. Actually, the same is true of the affy package, and xps as well.Our team is hoping to apply other types of normalization (e.g., loess, lowess, qspline) in addition to quantile, which is why we're needing to break the oligo::summarize() function out separately from rma.