Hello everybody,
I was wondering, if it is possible/advised to use the fRMA algorithm to also analyse HT_MG-430_PM arrays.
The Affymetrix website states that: "The GeneChip HT MG-430 PM Array Plate enables high-throughput expression profiling of 16, 24, or 96 samples at a time using the same industry-standard content as the GeneChip Mouse Genome 430 2.0 Array".
Hence I hope, that I can still apply the fRMA algorithm also to this type of array. However if I read in the AffyBatch in a regular manner and just provide the input.vecs to fRMA, I end up with this type of error:
#library("mouse4302frmavecs") #data(mouse4302frmavecs) myEset <- frma(myAffyBatch,input.vecs = mouse4302frmavecs) Error in frmaAffyBatch(object, background, normalize, summarize, input.vecs, : Mismatch between pmindex(object) and names of input.vecs and unable to create unique mapping.
It turns out, that the probe names do not match:
names(pmindex(myAffyBatch))[1:3]
[1] "1415670_PM_at" "1415671_PM_at" "1415672_PM_at"
names(mouse4302frmavecs[["probesetSD"]])[1:3]
[1] "1415670_at" "1415671_at" "1415672_at"
The initial approach, which I considered was a little gsub manimulation of the names, but I have doubts to mess with the probe assignments too much and end up with wrongly normalized expression calls, since also the length of the vectors does not match (difference of 40 probes).
Since the ReadAffy function offers a possibility to modify the cdfname, I rather tested this (unsuccessful) approach:
myAffyBatch <- ReadAffy(celfile.path="./data/CELfiles/", cdfname="mouse4302")
myEset <- frma(myAffyBatch,input.vecs = mouse4302frmavecs)
Error in exprs(object)[index, , drop = FALSE] : subscript out of bounds In addition: Warning messages: 1: replacing previous import by ‘utils::tail’ when loading ‘mouse4302cdf’ 2: replacing previous import by ‘utils::head’ when loading ‘mouse4302cdf’
So I would kindly ask for help, if and how I can reasonably analyse HT_MG-430_PM Arrays with fRMA.
Thanks a lot
Matthias
> sessionInfo() R version 3.2.3 (2015-12-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] htmg430pmcdf_2.18.0 mouse4302cdf_2.18.0 mouse4302frmavecs_1.5.0 [4] mouse430a2frmavecs_1.3.0 frma_1.22.0 limma_3.26.5 [7] affy_1.48.0 Biobase_2.30.0 BiocGenerics_0.16.1 [10] BiocInstaller_1.20.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.32.3 affxparser_1.42.0 XVector_0.10.0 [4] splines_3.2.3 GenomicRanges_1.22.3 zlibbioc_1.16.0 [7] MASS_7.3-45 IRanges_2.4.6 bit_1.1-12 [10] foreach_1.4.3 GenomeInfoDb_1.6.1 tools_3.2.3 [13] SummarizedExperiment_1.0.2 ff_2.2-13 DBI_0.3.1 [16] iterators_1.0.8 oligoClasses_1.32.0 preprocessCore_1.32.0 [19] oligo_1.34.2 affyio_1.40.0 S4Vectors_0.8.7 [22] codetools_0.2-14 RSQLite_1.0.0 Biostrings_2.38.3 [25] stats4_3.2.3
Matthias,
I would start by creating a map from the htmg430pm CDF to the mouse4302 CDF. To do this you will likely need to look at the specific probe sequences using the htmg430pmprobe and mouse4302probe packages. These have the actual sequence of each probe. This is the safe way to (1) check that the probes are actually identical between platforms, and (2) create a mapping between probe ids.
Here is a function I wrote a while back to map probes between platforms:
And here is a function that uses the makeMaps function to convert between platforms:
Those functions may need to be modified to suit your specific situation, but hopefully they provide a starting point.
It also appears that the 40 probes that are specific to the htmg430pm array are control probes (begin with AFFX). It's probably safe to remove those.
Not to dissuade you from exploring this further, but is there a specific reason you are not using RMA? With a small data set like the one you describe, there isn't a huge benefit to using fRMA. And the results from fRMA and RMA will likely be very similar.
Best, Matt
Hello Matt,
Thanks a lot for your help. Your
convertPlatform
function was exactly, what I was looking for and solved my technical problems. In the meantime I realized, that you also included it as part of frmatools package - sorry for bothering you with that issue.Frankly speaking, the differences between fRMA and RMA are a closed book to me. I have some NGS experience, but none with arrays. We are interested single array results and the absolute expression of a particular gene set and hence planned to apply the Barcode algorithm to the data. To me it seemed that the fRMA analysis is a prerequisite to correctly apply Barcode. In the NGS field, there are lots of aligners available, but I nevertheless usually stick to the one used by the original authors of the pipeline, even though I might have other personal preferences.
Thanks again for your help
Matthias
You are correct -- to use the barcode algorithm, you need to preprocess your data with frma. I didn't realize this was the eventual goal.