Hi,
I am trying to analyze HTA platform. I am able to use Affy CDF. I am trying to use Brainarray CDF. I could install packages hta20hsentrezgcdf, hta20hsentrezgprobe, hta20hsentrezg.db in R. I am trying to use function read.celfiles but it is throwing me the error.
affyGeneFS <- read.celfiles(geneCELs, pkgname = "hta20hsentrezgcdf")
The error is
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘kind’ for signature ‘"environment"’.
I even tried
cdf="hta_Hs_ENTREZG"
affyGeneFS <- read.celfiles(geneCELs, cdfname=cdf). It threw me the following error.
Error: These do not exist:
"hta_Hs_ENTREZG".
I believe readAffy function uses cdfname argument. I am not sure how to use the brainarray cdf in oligo package.
Thanks,
Prat
I am unable to use affy package for these array type since it is HTA platform. I have used oligo package and used read.celfiles functions and I am successful in mapping the annotations. As part of our pipeline we use custom CDF (Brainarray CDF) too. So, I am trying to use "hta20hsentrezgcdf". As you mentioned this isn't a pdInfoPackage. I did some reserach and I found the following script.
library(pdInfoBuilder)
library(ff)
library(doMC)
registerDoMC(10)
download.file("http://mbni.org/customcdf/21.0.0/ense.download/hta20_Hs_ENSE_21.0.0.zip","tmp.zip")
unzip("tmp.zip")
dir()
z <- cdf2table("hta20_Hs_ENSE.cdf")
I am trying to run this script and it is throwing me the following error in R.
R studio quit unexpectedly.
I am not sure if it memory issue. I have tried the same amazon cloud server too. It is throwing the same error there. I am wondering is there is anyway to use brainarray cdf with oligo package.
Prat
Recently I have analyzed a HTA2.0 array data set. Indeed, remapped
PdInfo
objects are not available for this (and some other) arrays. According to Manhong (maintainer of the Custom CDFs): "It is very slow to use BioC's built-in method to build those pdInfo packages, some of them even give me coredump. This is why some Pd packages are missing".You therefore limited to use the 'old fashioned'
affy
package to do this. However, since people are discouraged to actually useaffy
for these generation of arrays, a warning (for the Gene ST 1.x) or error (for the Gene ST 2.x, HTA, MTA, Clariom, etc) is returned when trying to load these arrays usingReadAffy()
. This warning/error has been disabled in theaffy
library available at the MBNI site here (check for "Modified 'affy_1.52.0' Package", available as source or binary). Only use this package if you know what you are doing!To analyze the data, first download and install the required files (for EntrezGene-based):
and then proceed like James said above (with a small modification: note that the 'cdf'' part is omitted in the name when specifying the cdf [hta20hsentrezg]).
And to add further to what Guido has said, and what you have noted. When you get a core dump from cdf2table (which is what Manhong Dai notes as well), what is happening is that the C code, which is based on Affy's Calvin software is segfaulting! So code that comes directly from the manufacturer of these arrays cannot read in the cdf that Manhong has produced.
I also tried to read in the cdf using the code in affyio, which is not to my knowledge based on Calvin, but is code that Ben Bolstad wrote before Affy released their Calvin software. And this software segfaults as well! As a matter of fact, read.cdffile in the makecdfenv package segfaults when trying to read this cdf, so it's not clear to me how Manhong was able to generate any cdf packages for this array.
It's possible that this is just some little thing in the cdf file that the C code underlying all these functions doesn't like, but then again maybe not. It's not clear why the cdf is causing segfaults, and I don't care enough to try to track this down. But there is some unexplained problem with this cdf, and you are assuming that it is not a big deal when you use the MBNI cdf package.
Hi James,
That's a good observation with some potential serious implications! Thanks for making me/us aware of this.
Just to add my experience: first of all, I also don't know what the exact procedure is for Manhong when he generates these custom CDFs, and how he creates the CDF files themselves. However, my preferred way of creating a CDF file from an environment is using the functionality available in
aroma.affymetrix
(env2Cdf()
; see here). NB: For this you also need a CEL file.If I do this for the HTA2.0 array, I can make a CDF file, which in turn can be used by
cdf2table()
without crashing R, and ultimately a corresponding PdInfo package can be created... I realize, though, that I use as 'source' the custom CDF library that *may* have some problem...Now create a PdInfo package from CDF file generated above (recycling code form James available here A: How to use brainarray custom cdf with oligo package?). Remarkable, this code James posted dealt also with the HTA2.0 array, but then worked nicely (difference is that v19 of the custom CDF was then used as source....)
Thanks, much Guido Hooiveld. It perfectly worked. My primary goal to use the Brianarray CDF is differential gene expression. I have one last question. As part of probe filtering, I use a histogram to plot my probe intensities and then choose a certain threshold where most of the low-intensity probes are located and filter them. But with this array type, I could use only "rma" normalization and my histograms have many platos. I am wondering if there is any stringent way to perform probe filtering for this array type. I am trying to use genefilter function now.
Prat
If I filter my data set, which I usually do not, then most of the time I do this based on the IQR (interquartile range).
See below for the signal distribution of my HTA2.0 data set (RMA normalized; RNA was from cell lines).
http://imgur.com/BfcLLIl