Missing probesets when creating Affymetrix GeneChip miRNA 4.0 CDF package using makecdfenv package
Lei Huang [guest] <guest at="" ...=""> writes: > > > Dear all, > > I am working on a set of Affymetrix GeneChip miRNA 4.0 microarray data and would like to perform > differential expression analysis using Bioconductor packages. Since this is a fairly new platform, no > CDF and annotation packages are available in bioconductor repository at the moment. Affymetrix folks > kindly provided me miRNA 4.0 CDF file as well as sample CEL data. So I desided to create a CDF package by my own > using make.cdf.package() from makecdfenv package. I was able to make the package and install it without > trouble. However, after I read the raw CEL files and normalized the affybatch with vsnrma()/rma(), I > found the number of probesets is only 25065 while the number is 36249 in original Affymetrix miRNA 4.0 CDF > file. I am aware that from version 4, Affymetrix changed their naming conve > ntion for the probeset IDs, but this shouldn't cause the problem of missing probesets. What I did wrong? I > would really appreciate if anyone could give me some hints/advices on solving this > problem. > > -Lei > > -- > Lei Huang > Center for Research Informatics > Biological Science Division > University of Chicago > http://cri.uchicago.edu > -- > > P.S. The following are the code and output from my R session: > > > setwd("~/Documents/Project/mirna/GeneChip 4-0 Array Sample Data") > > library(affy) > > library(makecdfenv) > Loading required package: affyio > > pkgpath <- tempdir() > > pname <- cleancdfname(whatcdf("20131118_Human-Brain-AM7962- 130ng_rep1_(miRNA-4_0).CEL")) > > make.cdf.package("miRNA-4_0-st-v1.cdf", > cdf.path="~/Documents/Project/mirna/miRNA-4_0-st-v1_CDF", > + compress=FALSE, species = "", packagename=pname, package.path = pkgpath) > Reading CDF file. > Creating CDF environment > Wait for about 251 dots.................................................................. ...... ...................................................................... ...... ...................................................................... ...... ............................. > Creating package in /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cd f > > README PLEASE: > A source package has now been produced in > /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40 cdf. > Before using this package it must be installed via 'R CMD INSTALL' > at a terminal prompt (or DOS command shell). > If you are using Windows, you will need to get set up to install packages. > See the 'R Installation and Administration' manual, specifically > Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset' > for more information. > > Alternatively, you could use make.cdf.env(), which will not require you to install a package. > However, this environment will only persist for the current R session > unless you save() it. > > ## install the cdf package from shell > ## cd to mirna40cdf location > ## R CMD INSTALL mirna40cdf > > > library(limma) > > library(vsn) > > library(mirna40cdf) > > > > affybatch <- ReadAffy(filenames=list.files()) > > affybatch <at> cdfName > [1] "miRNA-4_0" > > ## normalization > > eset.norm <- vsnrma(affybatch) > vsn2: 292681 x 8 matrix (1 stratum). > Please use 'meanSdPlot' to verify the fit. > Calculating Expression > > ## only 25,065 probesets, the original Affymetrix cdf file contains 36,249 probesets > > dim(eset.norm) > Features Samples > 25065 8 > > -- output of sessionInfo(): > > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] mirna40cdf_1.38.0 AnnotationDbi_1.24.0 vsn_3.30.0 > [4] limma_3.18.9 makecdfenv_1.38.0 affyio_1.30.0 > [7] affy_1.40.0 Biobase_2.22.0 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] BiocInstaller_1.12.0 compiler_3.0.2 DBI_0.2-7 > [4] grid_3.0.2 IRanges_1.20.6 lattice_0.20-24 > [7] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2 > [10] tools_3.0.2 zlibbioc_1.8.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at ... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > I came across a similar problem with a brainCDF where makecdfenv was producing a package with less probesets. I believe the problem is in the c code that does the parser of ASCII files since I was able to correct the problem by converting the text CDF into binary and then read it with the makecdfenv package library("affxparser") library(makecdfenv) convertCdf("HGU133PLUS2_HS_REFSEQ.CDF", "hgu133plus2hsrefseqcdf", version=4, verbose=TRUE) make.cdf.package("hgu133plus2hsrefseqcdf", version = packageDescription("makecdfenv", field = "Version"), species = "H. sapiens", unlink = TRUE) I hope this helps. Isaac
