Mouse Gene ST v1 CDF Issues (MoGene10stv1): Failure of affyPLM and pdfInfoBuilder

0

Entering edit mode

Peter White ▴ 130

@peter-white-3162

Last seen 10.6 years ago

>I am having some issues with the Affymetrix Mouse Gene ST 1.0 array (MoGene10stv1) and bioconductor. I can see that there are issues regarding this array and the unsupported CDF that can be downloaded from Affy but I was able to create the mogene10stv1cdf library as outlined in the thread: https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html I have processed the data using both Bioconductors Affy Package and the aroma.Affymetrix package but get different results. I believe the issue is that aroma is using the affyPLM model. I wanted to check this using the bioconductor affyPLM package but it will not work: Method 1 - works fine: library(affy) AffyRaw <- ReadAffy() AffyEset <- rma(AffyRaw) data.affy <- exprs(AffyEset) Method 2 - fails: library(affyPLM) AffyRaw <- ReadAffy() fit <- fitPLM(AffyRaw, verbos=9) Background correcting PM Normalizing PM Fitting models Error in fitPLM(AffyRaw, verbos = 9) : Realloc could not re-allocate (size 1150530304) memory I also tried the following but it still could not run: fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, residuals=FALSE, varcov="none", resid.SE=FALSE)) Finally, I dropped the number of arrays from 16 to 6, then down to 2, but still no luck. So from piecing together different threads I wondered if the issue lied with the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo pipeline as outlined in this thread: http://article.gmane.org/gmane.science.biology.informatics.conductor/1 8963/matc h=mogene Again, I ran into problems: > pgfFile <- "MoGene-1_0-st-v1.r3.pgf" > clfFile <- "MoGene-1_0-st-v1.r3.clf" > transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt" > probeFile <- "MoGene-1_0-st-v1.probe.tab" > pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", email="peter.white at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9, July 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile= transFile, probeFile=probeFile) > makePdInfoPackage(pkg, destDir=".") Creating package in ./pd.mogene.1.0.st.v1 loadUnitsByBatch took 54.44 sec loadAffyCsv took 53.58 sec loadAffySeqCsv took 80.68 sec DB sort, index creation took 90.24 sec [1] TRUE Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' Close R and start the command prompt and navigate to the directory with the package: R CMD INSTALL pd.mogene.1.0.st.v1\ installing to 'c:/PROGRA~2/R/R-28~1.0/library' ---------- Making package pd.mogene.1.0.st.v1 ------------ adding build stamp to DESCRIPTION installing NAMESPACE file and metadata installing R files installing inst files FIND: Parameter format not correct make[2]: *** [c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2 make[1]: *** [all] Error 2 make: *** [pkg-pd.mogene.1.0.st.v1] Error 2 *** Installation of pd.mogene.1.0.st.v1 failed *** Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1' So the installation fails and I cannot work out why (I have RTools and Cygwin installed). I did notice some inconsistencies in the annotation files for these arrays that can be downloaded from the Affy site and wondered if these could be the source of the problem: 1. From the file MoGene-1_0-st-v1.probe.tab there are 35,605 distinct Transcript IDs. 2. From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are 35,567 transcript IDs . 38 transcripts ids are missing from this file. What are they and why were they not included (10412488, 10412495, 10412500, 10412503, 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, 10490232, 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, 10536163, 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, 10598207, 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, 10608554, 10608603, 10608606) 3. From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript IDs. So we are now missing an additional 93 probe sets (all of these can be found in the transcript file: 10338002, 10338005, 10338006, 10338007, 10338008, 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, 10338016, 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, 10338027, 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, 10338039, 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, 10338051, 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, 10338062, 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, 10362452, 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, 10436496, 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, 10481652, 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, 10513713, 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, 10582560, 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614, 10602194). Again, why were they not included? BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) on a Windows XP 64-bit machine. Any help out there would be greatly appreciated. Thanks, Peter Peter White, Ph.D. Director, Biomedical Genomics Core Research Assistant Professor of Pediatrics The Research Institute at Nationwide Children's Hospital and The Ohio State University Mailing Address: The Research Institute at Nationwide Children's Hospital 700 Children's Drive, W510 Columbus, OH 43205 Office: (614) 355-2671 Lab: (614) 355-5252 Fax: (614) 722-2818 Web: http://genomics.nchresearch.org/

Annotation cdf probe affy affyPLM Annotation cdf probe affy affyPLM • 1.7k views

ADD COMMENT • link updated 16.4 years ago by James W. MacDonald 68k • written 16.4 years ago by Peter White ▴ 130

0

Entering edit mode

Peter White ▴ 130

@peter-white-3162

Last seen 10.6 years ago

I am having some issues with the Affymetrix Mouse Gene ST 1.0 array (MoGene10stv1) and bioconductor. I can see that there are issues regarding this array and the unsupported CDF that can be downloaded from Affy but I was able to create the mogene10stv1cdf library as outlined in the thread: https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html I have processed the data using both Bioconductors Affy Package and the aroma.Affymetrix package but get different results. I believe the issue is that aroma is using the affyPLM model. I wanted to check this using the bioconductor affyPLM package but it will not work: Method 1 - works fine: library(affy) AffyRaw <- ReadAffy() AffyEset <- rma(AffyRaw) data.affy <- exprs(AffyEset) Method 2 - fails: library(affyPLM) AffyRaw <- ReadAffy() fit <- fitPLM(AffyRaw, verbos=9) Background correcting PM Normalizing PM Fitting models Error in fitPLM(AffyRaw, verbos = 9) : Realloc could not re-allocate (size 1150530304) memory I also tried the following but it still could not run: fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, residuals=FALSE, varcov="none", resid.SE=FALSE)) Finally, I dropped the number of arrays from 16 to 6, then down to 2, but still no luck. So from piecing together different threads I wondered if the issue lied with the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo pipeline as outlined in this thread: http://article.gmane.org/gmane.science.biology.informatics.conductor/1 8963/matc h=mogene Again, I ran into problems: > pgfFile <- "MoGene-1_0-st-v1.r3.pgf" > clfFile <- "MoGene-1_0-st-v1.r3.clf" > transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt" > probeFile <- "MoGene-1_0-st-v1.probe.tab" > pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", email="peter.white at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9, July 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile= transFile, probeFile=probeFile) > makePdInfoPackage(pkg, destDir=".") Creating package in ./pd.mogene.1.0.st.v1 loadUnitsByBatch took 54.44 sec loadAffyCsv took 53.58 sec loadAffySeqCsv took 80.68 sec DB sort, index creation took 90.24 sec [1] TRUE Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' Close R and start the command prompt and navigate to the directory with the package: R CMD INSTALL pd.mogene.1.0.st.v1\ installing to 'c:/PROGRA~2/R/R-28~1.0/library' ---------- Making package pd.mogene.1.0.st.v1 ------------ adding build stamp to DESCRIPTION installing NAMESPACE file and metadata installing R files installing inst files FIND: Parameter format not correct make[2]: *** [c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2 make[1]: *** [all] Error 2 make: *** [pkg-pd.mogene.1.0.st.v1] Error 2 *** Installation of pd.mogene.1.0.st.v1 failed *** Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1' So the installation fails and I cannot work out why (I have RTools and Cygwin installed). I did notice some inconsistencies in the annotation files for these arrays that can be downloaded from the Affy site and wondered if these could be the source of the problem: 1. From the file MoGene-1_0-st-v1.probe.tab there are 35,605 distinct Transcript IDs. 2. From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are 35,567 transcript IDs . 38 transcripts ids are missing from this file. What are they and why were they not included (10412488, 10412495, 10412500, 10412503, 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, 10490232, 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, 10536163, 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, 10598207, 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, 10608554, 10608603, 10608606) 3. From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript IDs. So we are now missing an additional 93 probe sets (all of these can be found in the transcript file: 10338002, 10338005, 10338006, 10338007, 10338008, 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, 10338016, 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, 10338027, 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, 10338039, 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, 10338051, 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, 10338062, 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, 10362452, 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, 10436496, 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, 10481652, 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, 10513713, 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, 10582560, 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614, 10602194). Again, why were they not included? BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) on a Windows XP 64-bit machine. Any help out there would be greatly appreciated. Thanks, Peter Peter White, Ph.D. Director, Biomedical Genomics Core Research Assistant Professor of Pediatrics The Research Institute at Nationwide Children's Hospital and The Ohio State University Mailing Address: The Research Institute at Nationwide Children's Hospital 700 Children's Drive, W510 Columbus, OH 43205 Office: (614) 355-2671 Lab: (614) 355-5252 Fax: (614) 722-2818 Web: http://genomics.nchresearch.org/ ----------------------------------------- Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail and destroy all copies of the original message. Thank you. [[alternative HTML version deleted]]

ADD COMMENT • link 16.4 years ago Peter White ▴ 130

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 41 minutes ago

United States

Hi Peter, I won't comment on aroma.affymetrix, nor building cdf packages using makecdfenv as the former has its own mailing list and the latter isn't really supported - the list archive you quote is Ben Bolstad showing that you _could_ use makecdfenv, but then raising several questions that have not been resolved to my knowledge. As for building a pdInfoPackage, this works fine for me: > makePdInfoPackage(pkg, destDir=".") Creating package in ./pd.mogene.1.0.st.v1 loadUnitsByBatch took 46.92 sec loadAffyCsv took 19.19 sec loadAffySeqCsv took 51.92 sec DB sort, index creation took 20.82 sec [1] TRUE Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > sessionInfo() R version 2.8.0 (2008-10-20) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices datasets utils [8] methods base other attached packages: [1] pdInfoBuilder_1.6.0 oligo_1.6.0 oligoClasses_1.4.0 [4] AnnotationDbi_1.4.0 preprocessCore_1.4.0 affxparser_1.14.0 [7] RSQLite_0.7-1 DBI_0.2-4 Biobase_2.2.0 Note that it would have been helpful for you to give us your sessionInfo() as well. The install went fine: ---------- Making package pd.mogene.1.0.st.v1 ------------ adding build stamp to DESCRIPTION installing NAMESPACE file and metadata installing R files installing inst files preparing package pd.mogene.1.0.st.v1 for lazy loading Loading required package: RSQLite Loading required package: DBI Loading required package: oligoClasses Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. no man files in this package installing indices installing help adding MD5 sums * DONE (pd.mogene.1.0.st.v1) I would bet that your problem stems from having Cygwin installed as well as the Windows Toolset (Rtools). If you don't have your path set correctly, then you may find the wrong version of certain tools and things won't build correctly. I have personally found that Cygwin is problematic when installed, and can make matters worse if you then uninstall because for whatever reason you then cannot find certain tools. Does the install directory of the Windows Toolset reside higher up in the PATH than Cygwin? Best, Jim Peter White wrote: >> I am having some issues with the Affymetrix Mouse Gene ST 1.0 array > (MoGene10stv1) and bioconductor. I can see that there are issues regarding this > array and the unsupported CDF that can be downloaded from Affy but I was able > to create the mogene10stv1cdf library as outlined in the thread: > > https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html > > I have processed the data using both Bioconductors Affy Package and the > aroma.Affymetrix package but get different results. I believe the issue is that > aroma is using the affyPLM model. I wanted to check this using the bioconductor > affyPLM package but it will not work: > > Method 1 - works fine: > > library(affy) > AffyRaw <- ReadAffy() > AffyEset <- rma(AffyRaw) > data.affy <- exprs(AffyEset) > > Method 2 - fails: > > library(affyPLM) > AffyRaw <- ReadAffy() > fit <- fitPLM(AffyRaw, verbos=9) > > Background correcting PM > Normalizing PM > Fitting models > Error in fitPLM(AffyRaw, verbos = 9) : > Realloc could not re-allocate (size 1150530304) memory > > I also tried the following but it still could not run: > > fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, residuals=FALSE, > varcov="none", resid.SE=FALSE)) > > Finally, I dropped the number of arrays from 16 to 6, then down to 2, but still > no luck. > > So from piecing together different threads I wondered if the issue lied with > the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo pipeline > as outlined in this thread: > > http://article.gmane.org/gmane.science.biology.informatics.conductor /18963/matc > h=mogene > > Again, I ran into problems: > >> pgfFile <- "MoGene-1_0-st-v1.r3.pgf" >> clfFile <- "MoGene-1_0-st-v1.r3.clf" >> transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt" >> probeFile <- "MoGene-1_0-st-v1.probe.tab" >> pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", email="peter.white > at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9, July > 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, transFile= > transFile, probeFile=probeFile) >> makePdInfoPackage(pkg, destDir=".") > Creating package in ./pd.mogene.1.0.st.v1 > loadUnitsByBatch took 54.44 sec > loadAffyCsv took 53.58 sec > loadAffySeqCsv took 80.68 sec > DB sort, index creation took 90.24 sec > [1] TRUE > Warning messages: > 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > > Close R and start the command prompt and navigate to the directory with the > package: > > R CMD INSTALL pd.mogene.1.0.st.v1\ > > installing to 'c:/PROGRA~2/R/R-28~1.0/library' > > ---------- Making package pd.mogene.1.0.st.v1 ------------ > adding build stamp to DESCRIPTION > installing NAMESPACE file and metadata > installing R files > installing inst files > FIND: Parameter format not correct > make[2]: *** [c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2 > make[1]: *** [all] Error 2 > make: *** [pkg-pd.mogene.1.0.st.v1] Error 2 > *** Installation of pd.mogene.1.0.st.v1 failed *** > > Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1' > > So the installation fails and I cannot work out why (I have RTools and Cygwin > installed). I did notice some inconsistencies in the annotation files for these > arrays that can be downloaded from the Affy site and wondered if these could be > the source of the problem: > > 1. From the file MoGene-1_0-st-v1.probe.tab there are 35,605 distinct > Transcript IDs. > 2. From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are 35,567 > transcript IDs . 38 transcripts ids are missing from this file. What are they > and why were they not included (10412488, 10412495, 10412500, 10412503, > 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, 10490232, > 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, 10536163, > 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, 10598207, > 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, 10608554, > 10608603, 10608606) > 3. From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript IDs. > So we are now missing an additional 93 probe sets (all of these can be found in > the transcript file: 10338002, 10338005, 10338006, 10338007, 10338008, > 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, 10338016, > 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, 10338027, > 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, 10338039, > 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, 10338051, > 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, 10338062, > 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, 10362452, > 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, 10436496, > 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, 10481652, > 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, 10513713, > 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, 10582560, > 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614, > 10602194). Again, why were they not included? > > BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) on a > Windows XP 64-bit machine. > > Any help out there would be greatly appreciated. > > Thanks, > > Peter > > Peter White, Ph.D. > Director, Biomedical Genomics Core > Research Assistant Professor of Pediatrics > The Research Institute at > Nationwide Children's Hospital and > The Ohio State University > > Mailing Address: > > The Research Institute at > Nationwide Children's Hospital > 700 Children's Drive, W510 > Columbus, OH 43205 > > Office: (614) 355-2671 > Lab: (614) 355-5252 > Fax: (614) 722-2818 > Web: http://genomics.nchresearch.org/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

ADD COMMENT • link 16.4 years ago James W. MacDonald 68k

0

Entering edit mode

Dear Jim, Thanks for responding so quickly - it worked! I installed RTools version 2.9, installed Inno Setup (not sure if it was needed but I didn't have it), and updated Cygwin. Also I reordered the path so that RTools came first, and added as missing path statement for mitext and html help as outlined at http://cran.r-project.org/doc/manuals/R-admin.html#The-Windows- toolset. Not sure which of these steps was at fault but once that was done the package compiled fine and I was able to process my Gene ST cel files with Oligo. If anyone has any further thoughts on affyPLM and the Gene ST arrays, or the issues I highlighted with the Affy annotation files and inconsistent probe information, I would appreciate it. Using pdInfoBuilder the resulting mouse Gene ST annotation file has 35,557 ids (note this is 10 less than is found in the MoGene-1_0-st-v1.na26.mm9.transcript.csv file, which has 35,567??). Using the unsupported CDF with either makecdfenv or aroma.affymetrix gives 35,512. It's probably just control probesets that are being lost, but I am a little concerned that none of these methods returns the 35,605 distinct transcript IDs found in the MoGene-1_0-st-v1.probe.tab file from Affy. Thanks. Peter > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at med.umich.edu] > Sent: Tuesday, December 02, 2008 9:32 AM > To: White, Peter > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Mouse Gene ST v1 CDF Issues (MoGene10stv1): Failure > of affyPLM and pdfInfoBuilder > > Hi Peter, > > I won't comment on aroma.affymetrix, nor building cdf packages using > makecdfenv as the former has its own mailing list and the latter isn't > really supported - the list archive you quote is Ben Bolstad showing > that you _could_ use makecdfenv, but then raising several questions > that > have not been resolved to my knowledge. > > As for building a pdInfoPackage, this works fine for me: > > > makePdInfoPackage(pkg, destDir=".") > Creating package in ./pd.mogene.1.0.st.v1 > loadUnitsByBatch took 46.92 sec > loadAffyCsv took 19.19 sec > loadAffySeqCsv took 51.92 sec > DB sort, index creation took 20.82 sec > [1] TRUE > Warning messages: > 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > > sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices datasets utils > [8] methods base > > other attached packages: > [1] pdInfoBuilder_1.6.0 oligo_1.6.0 oligoClasses_1.4.0 > [4] AnnotationDbi_1.4.0 preprocessCore_1.4.0 affxparser_1.14.0 > [7] RSQLite_0.7-1 DBI_0.2-4 Biobase_2.2.0 > > Note that it would have been helpful for you to give us your > sessionInfo() as well. > > The install went fine: > > ---------- Making package pd.mogene.1.0.st.v1 ------------ > adding build stamp to DESCRIPTION > installing NAMESPACE file and metadata > installing R files > installing inst files > preparing package pd.mogene.1.0.st.v1 for lazy loading > Loading required package: RSQLite > Loading required package: DBI > Loading required package: oligoClasses > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > no man files in this package > installing indices > installing help > adding MD5 sums > > * DONE (pd.mogene.1.0.st.v1) > > I would bet that your problem stems from having Cygwin installed as > well > as the Windows Toolset (Rtools). If you don't have your path set > correctly, then you may find the wrong version of certain tools and > things won't build correctly. > > I have personally found that Cygwin is problematic when installed, and > can make matters worse if you then uninstall because for whatever > reason > you then cannot find certain tools. Does the install directory of the > Windows Toolset reside higher up in the PATH than Cygwin? > > Best, > > Jim > > > > Peter White wrote: > >> I am having some issues with the Affymetrix Mouse Gene ST 1.0 array > > (MoGene10stv1) and bioconductor. I can see that there are issues > regarding this > > array and the unsupported CDF that can be downloaded from Affy but I > was able > > to create the mogene10stv1cdf library as outlined in the thread: > > > > https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html > > > > I have processed the data using both Bioconductors Affy Package and > the > > aroma.Affymetrix package but get different results. I believe the > issue is that > > aroma is using the affyPLM model. I wanted to check this using the > bioconductor > > affyPLM package but it will not work: > > > > Method 1 - works fine: > > > > library(affy) > > AffyRaw <- ReadAffy() > > AffyEset <- rma(AffyRaw) > > data.affy <- exprs(AffyEset) > > > > Method 2 - fails: > > > > library(affyPLM) > > AffyRaw <- ReadAffy() > > fit <- fitPLM(AffyRaw, verbos=9) > > > > Background correcting PM > > Normalizing PM > > Fitting models > > Error in fitPLM(AffyRaw, verbos = 9) : > > Realloc could not re-allocate (size 1150530304) memory > > > > I also tried the following but it still could not run: > > > > fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, > residuals=FALSE, > > varcov="none", resid.SE=FALSE)) > > > > Finally, I dropped the number of arrays from 16 to 6, then down to 2, > but still > > no luck. > > > > So from piecing together different threads I wondered if the issue > lied with > > the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo > pipeline > > as outlined in this thread: > > > > > http://article.gmane.org/gmane.science.biology.informatics.conductor/1 8 > 963/matc > > h=mogene > > > > Again, I ran into problems: > > > >> pgfFile <- "MoGene-1_0-st-v1.r3.pgf" > >> clfFile <- "MoGene-1_0-st-v1.r3.clf" > >> transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt" > >> probeFile <- "MoGene-1_0-st-v1.probe.tab" > >> pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", > email="peter.white > > at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9, > July > > 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, > transFile= > > transFile, probeFile=probeFile) > >> makePdInfoPackage(pkg, destDir=".") > > Creating package in ./pd.mogene.1.0.st.v1 > > loadUnitsByBatch took 54.44 sec > > loadAffyCsv took 53.58 sec > > loadAffySeqCsv took 80.68 sec > > DB sort, index creation took 90.24 sec > > [1] TRUE > > Warning messages: > > 1: In is.na(x) : is.na() applied to non-(list or vector) of type > 'NULL' > > 2: In is.na(x) : is.na() applied to non-(list or vector) of type > 'NULL' > > > > Close R and start the command prompt and navigate to the directory > with the > > package: > > > > R CMD INSTALL pd.mogene.1.0.st.v1\ > > > > installing to 'c:/PROGRA~2/R/R-28~1.0/library' > > > > ---------- Making package pd.mogene.1.0.st.v1 ------------ > > adding build stamp to DESCRIPTION > > installing NAMESPACE file and metadata > > installing R files > > installing inst files > > FIND: Parameter format not correct > > make[2]: *** [c:/PROGRA~2/R/R- > 28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2 > > make[1]: *** [all] Error 2 > > make: *** [pkg-pd.mogene.1.0.st.v1] Error 2 > > *** Installation of pd.mogene.1.0.st.v1 failed *** > > > > Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1' > > > > So the installation fails and I cannot work out why (I have RTools > and Cygwin > > installed). I did notice some inconsistencies in the annotation files > for these > > arrays that can be downloaded from the Affy site and wondered if > these could be > > the source of the problem: > > > > 1. From the file MoGene-1_0-st-v1.probe.tab there are 35,605 > distinct > > Transcript IDs. > > 2. From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are > 35,567 > > transcript IDs . 38 transcripts ids are missing from this file. What > are they > > and why were they not included (10412488, 10412495, 10412500, > 10412503, > > 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, > 10490232, > > 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, > 10536163, > > 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, > 10598207, > > 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, > 10608554, > > 10608603, 10608606) > > 3. From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript > IDs. > > So we are now missing an additional 93 probe sets (all of these can > be found in > > the transcript file: 10338002, 10338005, 10338006, 10338007, > 10338008, > > 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, > 10338016, > > 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, > 10338027, > > 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, > 10338039, > > 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, > 10338051, > > 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, > 10338062, > > 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, > 10362452, > > 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, > 10436496, > > 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, > 10481652, > > 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, > 10513713, > > 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, > 10582560, > > 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614, > > 10602194). Again, why were they not included? > > > > BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) > on a > > Windows XP 64-bit machine. > > > > Any help out there would be greatly appreciated. > > > > Thanks, > > > > Peter > > > > Peter White, Ph.D. > > Director, Biomedical Genomics Core > > Research Assistant Professor of Pediatrics > > The Research Institute at > > Nationwide Children's Hospital and > > The Ohio State University > > > > Mailing Address: > > > > The Research Institute at > > Nationwide Children's Hospital > > 700 Children's Drive, W510 > > Columbus, OH 43205 > > > > Office: (614) 355-2671 > > Lab: (614) 355-5252 > > Fax: (614) 722-2818 > > Web: http://genomics.nchresearch.org/ > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 ----------------------------------------- Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail and destroy all copies of the original message. Thank you.

ADD REPLY • link 16.4 years ago Peter White ▴ 130

0

Entering edit mode

Here is the response I received from Affymetrix (thanks) regarding the differences in the CDF, probe.tab, and annotation.csv files for the Mouse Gene 1.0 ST V1 array: Hello Dr White- 1) We no longer use a cdf file for our software. The unsupported one was made for use in third party software so there are no current plans to create a different version of the cdf file. 2)The cause for the missing 38 TCs in question #2 is that they were not mappable to the mm9 version of the mouse genome assembly (NCBI build 37). The probe tab file contains all probes that exist on the array as it was designed, and it was designed on the basis of the mm8 version of the mouse genome (NCBI build 36). So the probe.tab file is a design-time view of the probes, while the NetAffx annotation CSV is an annotation-time view of the transcript clusters. To provide the most accurate, biologically realistic annotations, we used the most recent version of the genome. When a given genomic region gets re-organized in the updated assembly, this can prevent or substantially change the way the probes of a transcript cluster map to the new genome version. These transcript clusters are removed from the NetAffx annotation analysis, where they could cause faulty RNA assignments. So these 38 TCs can be ignored. We have considered making available a mm9-version of the probe tab file in the future, which would avoid this sort of confusion. 3) There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip that are listed in the csv annotation file, and searchable in the MoGene chip at NetAffx, but that are not present in the [unsupported] CDF file from netaffx. 45 of these ID's are present in the MoGene PGF file, and correspond to the antigenomic probesets, but the remaining 48 are not in the PGF file either. The remaining 48 transcript cluster IDs the customer identified as not in the PGF file are from what we call low-coverage transcript clusters: those having less than 4 probes. These tend to be very short, non-biologically interesting sequences and were excluded from the PGF with the intent that they should not be analyzed by users. So the advice is that the user can safely ignore them. In the NA27 release of the annotations (due out end of next week) those low-coverage transcript clusters should now be removed from the NetAffx annotation CSV file for all of the Gene arrays. > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at med.umich.edu] > Sent: Tuesday, December 02, 2008 9:32 AM > To: White, Peter > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] Mouse Gene ST v1 CDF Issues (MoGene10stv1): Failure > of affyPLM and pdfInfoBuilder > > Hi Peter, > > I won't comment on aroma.affymetrix, nor building cdf packages using > makecdfenv as the former has its own mailing list and the latter isn't > really supported - the list archive you quote is Ben Bolstad showing > that you _could_ use makecdfenv, but then raising several questions > that > have not been resolved to my knowledge. > > As for building a pdInfoPackage, this works fine for me: > > > makePdInfoPackage(pkg, destDir=".") > Creating package in ./pd.mogene.1.0.st.v1 > loadUnitsByBatch took 46.92 sec > loadAffyCsv took 19.19 sec > loadAffySeqCsv took 51.92 sec > DB sort, index creation took 20.82 sec > [1] TRUE > Warning messages: > 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > > sessionInfo() > R version 2.8.0 (2008-10-20) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] splines tools stats graphics grDevices datasets utils > [8] methods base > > other attached packages: > [1] pdInfoBuilder_1.6.0 oligo_1.6.0 oligoClasses_1.4.0 > [4] AnnotationDbi_1.4.0 preprocessCore_1.4.0 affxparser_1.14.0 > [7] RSQLite_0.7-1 DBI_0.2-4 Biobase_2.2.0 > > Note that it would have been helpful for you to give us your > sessionInfo() as well. > > The install went fine: > > ---------- Making package pd.mogene.1.0.st.v1 ------------ > adding build stamp to DESCRIPTION > installing NAMESPACE file and metadata > installing R files > installing inst files > preparing package pd.mogene.1.0.st.v1 for lazy loading > Loading required package: RSQLite > Loading required package: DBI > Loading required package: oligoClasses > Loading required package: Biobase > Loading required package: tools > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > no man files in this package > installing indices > installing help > adding MD5 sums > > * DONE (pd.mogene.1.0.st.v1) > > I would bet that your problem stems from having Cygwin installed as > well > as the Windows Toolset (Rtools). If you don't have your path set > correctly, then you may find the wrong version of certain tools and > things won't build correctly. > > I have personally found that Cygwin is problematic when installed, and > can make matters worse if you then uninstall because for whatever > reason > you then cannot find certain tools. Does the install directory of the > Windows Toolset reside higher up in the PATH than Cygwin? > > Best, > > Jim > > > > Peter White wrote: > >> I am having some issues with the Affymetrix Mouse Gene ST 1.0 array > > (MoGene10stv1) and bioconductor. I can see that there are issues > regarding this > > array and the unsupported CDF that can be downloaded from Affy but I > was able > > to create the mogene10stv1cdf library as outlined in the thread: > > > > https://stat.ethz.ch/pipermail/bioc-devel/2007-October/001403.html > > > > I have processed the data using both Bioconductors Affy Package and > the > > aroma.Affymetrix package but get different results. I believe the > issue is that > > aroma is using the affyPLM model. I wanted to check this using the > bioconductor > > affyPLM package but it will not work: > > > > Method 1 - works fine: > > > > library(affy) > > AffyRaw <- ReadAffy() > > AffyEset <- rma(AffyRaw) > > data.affy <- exprs(AffyEset) > > > > Method 2 - fails: > > > > library(affyPLM) > > AffyRaw <- ReadAffy() > > fit <- fitPLM(AffyRaw, verbos=9) > > > > Background correcting PM > > Normalizing PM > > Fitting models > > Error in fitPLM(AffyRaw, verbos = 9) : > > Realloc could not re-allocate (size 1150530304) memory > > > > I also tried the following but it still could not run: > > > > fit <- fitPLM(AffyRaw, output.param=list(weights=FALSE, > residuals=FALSE, > > varcov="none", resid.SE=FALSE)) > > > > Finally, I dropped the number of arrays from 16 to 6, then down to 2, > but still > > no luck. > > > > So from piecing together different threads I wondered if the issue > lied with > > the unsupported CDF. So I attempted to use the pdfInfoBuilder / oligo > pipeline > > as outlined in this thread: > > > > > http://article.gmane.org/gmane.science.biology.informatics.conductor/1 8 > 963/matc > > h=mogene > > > > Again, I ran into problems: > > > >> pgfFile <- "MoGene-1_0-st-v1.r3.pgf" > >> clfFile <- "MoGene-1_0-st-v1.r3.clf" > >> transFile <- "MoGene-1_0-st-v1.na26.mm9.transcript.txt" > >> probeFile <- "MoGene-1_0-st-v1.probe.tab" > >> pkg <- new("AffyGenePDInfoPkgSeed", author="Peter White", > email="peter.white > > at nationwidechildrens.org", version="0.1.3", genomebuild="UCSC mm9, > July > > 2007", biocViews="AnnotationData", pgfFile=pgfFile, clfFile=clfFile, > transFile= > > transFile, probeFile=probeFile) > >> makePdInfoPackage(pkg, destDir=".") > > Creating package in ./pd.mogene.1.0.st.v1 > > loadUnitsByBatch took 54.44 sec > > loadAffyCsv took 53.58 sec > > loadAffySeqCsv took 80.68 sec > > DB sort, index creation took 90.24 sec > > [1] TRUE > > Warning messages: > > 1: In is.na(x) : is.na() applied to non-(list or vector) of type > 'NULL' > > 2: In is.na(x) : is.na() applied to non-(list or vector) of type > 'NULL' > > > > Close R and start the command prompt and navigate to the directory > with the > > package: > > > > R CMD INSTALL pd.mogene.1.0.st.v1\ > > > > installing to 'c:/PROGRA~2/R/R-28~1.0/library' > > > > ---------- Making package pd.mogene.1.0.st.v1 ------------ > > adding build stamp to DESCRIPTION > > installing NAMESPACE file and metadata > > installing R files > > installing inst files > > FIND: Parameter format not correct > > make[2]: *** [c:/PROGRA~2/R/R- > 28~1.0/library/pd.mogene.1.0.st.v1/inst] Error 2 > > make[1]: *** [all] Error 2 > > make: *** [pkg-pd.mogene.1.0.st.v1] Error 2 > > *** Installation of pd.mogene.1.0.st.v1 failed *** > > > > Removing 'c:/PROGRA~2/R/R-28~1.0/library/pd.mogene.1.0.st.v1' > > > > So the installation fails and I cannot work out why (I have RTools > and Cygwin > > installed). I did notice some inconsistencies in the annotation files > for these > > arrays that can be downloaded from the Affy site and wondered if > these could be > > the source of the problem: > > > > 1. From the file MoGene-1_0-st-v1.probe.tab there are 35,605 > distinct > > Transcript IDs. > > 2. From the file MoGene-1_0-st-v1.na26.mm9.transcript.csv there are > 35,567 > > transcript IDs . 38 transcripts ids are missing from this file. What > are they > > and why were they not included (10412488, 10412495, 10412500, > 10412503, > > 10412520, 10417226, 10417239, 10417269, 10417286, 10441511, 10468907, > 10490232, > > 10501544, 10535342, 10536010, 10536044, 10536095, 10536114, 10536118, > 10536163, > > 10550163, 10550775, 10560746, 10577361, 10598118, 10598141, 10598159, > 10598207, > > 10598220, 10598603, 10599086, 10606573, 10608226, 10608440, 10608551, > 10608554, > > 10608603, 10608606) > > 3. From the file MoGene-1_0-st-v1.r3.cdf there are 35,512 Transcript > IDs. > > So we are now missing an additional 93 probe sets (all of these can > be found in > > the transcript file: 10338002, 10338005, 10338006, 10338007, > 10338008, > > 10338009, 10338010, 10338011, 10338012, 10338013, 10338014, 10338015, > 10338016, > > 10338018, 10338019, 10338020, 10338021, 10338022, 10338023, 10338024, > 10338027, > > 10338028, 10338030, 10338031, 10338032, 10338033, 10338034, 10338038, > 10338039, > > 10338040, 10338043, 10338045, 10338046, 10338048, 10338049, 10338050, > 10338051, > > 10338052, 10338053, 10338054, 10338055, 10338057, 10338058, 10338061, > 10338062, > > 10349381, 10350469, 10354866, 10361826, 10362430, 10362438, 10362444, > 10362452, > > 10362872, 10369759, 10374030, 10391748, 10395778, 10411504, 10422960, > 10436496, > > 10436660, 10446349, 10453719, 10457089, 10458079, 10460144, 10461932, > 10481652, > > 10482786, 10487009, 10498317, 10501216, 10502040, 10502768, 10503414, > 10513713, > > 10521665, 10532622, 10535929, 10546555, 10552810, 10553535, 10560364, > 10582560, > > 10582566, 10582570, 10582576, 10585872, 10586931, 10592453, 10601614, > > 10602194). Again, why were they not included? > > > > BTW: I am using R 2.8.0 and the latest release of Bioconductor (2.3) > on a > > Windows XP 64-bit machine. > > > > Any help out there would be greatly appreciated. > > > > Thanks, > > > > Peter > > > > Peter White, Ph.D. > > Director, Biomedical Genomics Core > > Research Assistant Professor of Pediatrics > > The Research Institute at > > Nationwide Children's Hospital and > > The Ohio State University > > > > Mailing Address: > > > > The Research Institute at > > Nationwide Children's Hospital > > 700 Children's Drive, W510 > > Columbus, OH 43205 > > > > Office: (614) 355-2671 > > Lab: (614) 355-5252 > > Fax: (614) 722-2818 > > Web: http://genomics.nchresearch.org/ > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > James W. MacDonald, M.S. > Biostatistician > Hildebrandt Lab > 8220D MSRB III > 1150 W. Medical Center Drive > Ann Arbor MI 48109-0646 > 734-936-8662 ----------------------------------------- Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail and destroy all copies of the original message. Thank you.

ADD REPLY • link 16.4 years ago Peter White ▴ 130

Login before adding your answer.