problem read.maimage("Agilent") -limma
1
0
Entering edit mode
@gordon-smyth
Last seen 49 minutes ago
WEHI, Melbourne, Australia
> Date: Mon, 25 Jul 2005 12:22:22 -0400 > From: Naomi Altman <naomi at="" stat.psu.edu=""> > Subject: [BioC] problem read.maimage("Agilent") -limma > To: bioconductor at stat.math.ethz.ch > > I am having trouble reading the Agilent arabidopsis 22575 gene array using > read.maimage in Limma under R 2.1.1 (I don't know the limma version, but I > just downloaded using the R packages interface, and also used the update, > so I presume this is the most recent. You should have limma 2.0.2. > Under R 2.0.1, there was no problem reading all the data in the arrays using: > > RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","450 8.txt","4509.txt"),source="agilent" > ) > > dim(RGf$R) > 22575 6 > > > But under R 2.1.I, I get: > > RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","450 8.txt","4509.txt"),source="agilent" > ) > > dim(RGf$R) > 12956 6 > > The last line of RGf$R is all NA. > > The problem might be in RGf$genes. When I try to print any row up to the > last one, everything looks normal. Trying to print the last row kills > R. The annotation for this gene appears to be exceptionally long. I've just tried reading in some AgilentFE data and didn't have any problems. So I wasn't able to reproduce the error that you describe. Try isolating which input file is causing the problem. If you don't find a solution, you could zip up an example data file which causes the error and send it to me. Gordon > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111
Annotation limma Annotation limma • 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States
On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >> Date: Mon, 25 Jul 2005 12:22:22 -0400 >> From: Naomi Altman <naomi at="" stat.psu.edu=""> >> Subject: [BioC] problem read.maimage("Agilent") -limma >> To: bioconductor at stat.math.ethz.ch >> >> I am having trouble reading the Agilent arabidopsis 22575 gene array >> using >> read.maimage in Limma under R 2.1.1 (I don't know the limma version, >> but I >> just downloaded using the R packages interface, and also used the >> update, >> so I presume this is the most recent. > > You should have limma 2.0.2. > >> Under R 2.0.1, there was no problem reading all the data in the >> arrays using: >> >> RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. >> txt","4509.txt"),source="agilent" >> ) >> >> dim(RGf$R) >> 22575 6 >> >> >> But under R 2.1.I, I get: >> >> RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. >> txt","4509.txt"),source="agilent" >> ) >> >> dim(RGf$R) >> 12956 6 >> >> The last line of RGf$R is all NA. >> >> The problem might be in RGf$genes. When I try to print any row up to >> the >> last one, everything looks normal. Trying to print the last row kills >> R. The annotation for this gene appears to be exceptionally long. >> I have had problems with Agilent annotation files containing "special" characters that cause similar "termination" of file reading. I would look at the annotation for quotation marks, single quotes, # symbols (no idea why this seems to affect things), and backslashes. I typically write a little perl script to "clean" the files. I'm not sure why this should vary from one version to the next, though. Sean
ADD COMMENT
0
Entering edit mode
There are "\" and "#" before the offending line. I could not find any other unusual characters in the offending line. --Naomi At 09:59 AM 7/26/2005, Sean Davis wrote: >On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: > >>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>To: bioconductor at stat.math.ethz.ch >>> >>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>using >>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>but I >>>just downloaded using the R packages interface, and also used the >>>update, >>>so I presume this is the most recent. >> >>You should have limma 2.0.2. >> >>>Under R 2.0.1, there was no problem reading all the data in the >>>arrays using: >>> >>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","45 08. >>>txt","4509.txt"),source="agilent" >>>) >>> >>>dim(RGf$R) >>>22575 6 >>> >>> >>>But under R 2.1.I, I get: >>> >>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","45 08. >>>txt","4509.txt"),source="agilent" >>>) >>> >>>dim(RGf$R) >>>12956 6 >>> >>>The last line of RGf$R is all NA. >>> >>>The problem might be in RGf$genes. When I try to print any row up to >>>the >>>last one, everything looks normal. Trying to print the last row kills >>>R. The annotation for this gene appears to be exceptionally long. > >I have had problems with Agilent annotation files containing "special" >characters that cause similar "termination" of file reading. I would >look at the annotation for quotation marks, single quotes, # symbols >(no idea why this seems to affect things), and backslashes. I >typically write a little perl script to "clean" the files. I'm not >sure why this should vary from one version to the next, though. > >Sean > Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD REPLY
0
Entering edit mode
Recently we detected some problems with internal regexpr libraries in R v2.1.1. One of the symptoms was that R would crash on Windows, but also that the regular expression became corrupt in memory. This was partly fixed in the R v2.1.1 patched (2005-07-20). Note that this was introduced when the went from R v2.1.0 to v2.1.1, so this might be related to your problem. Cheers Henrik Naomi Altman wrote: > There are "\" and "#" before the offending line. I could not find any > other unusual characters in the offending line. > > --Naomi > > At 09:59 AM 7/26/2005, Sean Davis wrote: > > >>On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >> >> >>>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>>To: bioconductor at stat.math.ethz.ch >>>> >>>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>>using >>>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>>but I >>>>just downloaded using the R packages interface, and also used the >>>>update, >>>>so I presume this is the most recent. >>> >>>You should have limma 2.0.2. >>> >>> >>>>Under R 2.0.1, there was no problem reading all the data in the >>>>arrays using: >>>> >>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4 508. >>>>txt","4509.txt"),source="agilent" >>>>) >>>> >>>>dim(RGf$R) >>>>22575 6 >>>> >>>> >>>>But under R 2.1.I, I get: >>>> >>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4 508. >>>>txt","4509.txt"),source="agilent" >>>>) >>>> >>>>dim(RGf$R) >>>>12956 6 >>>> >>>>The last line of RGf$R is all NA. >>>> >>>>The problem might be in RGf$genes. When I try to print any row up to >>>>the >>>>last one, everything looks normal. Trying to print the last row kills >>>>R. The annotation for this gene appears to be exceptionally long. >> >>I have had problems with Agilent annotation files containing "special" >>characters that cause similar "termination" of file reading. I would >>look at the annotation for quotation marks, single quotes, # symbols >>(no idea why this seems to affect things), and backslashes. I >>typically write a little perl script to "clean" the files. I'm not >>sure why this should vary from one version to the next, though. >> >>Sean >> > > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > >
ADD REPLY
0
Entering edit mode
This is caused by an R bug introduced in R 2.1.1, which persists in R 2.1.1 patched. The function read.table() is now interpreting backslashes as C-style special characters. This change was supposed to affect scan() only, but apparently has spilled over into read.table() as well. The gene names in the AgilentFE export files contain strings such as \0, which is being matched as the null character. This is not only causing the file read to terminate premately, it is also causing a crash of R itself when the string is printed. At this moment, I can see no good work around apart from going back to an earlier version of R. I will take up the problem with R core for a fix. Martin? Gordon At 01:14 AM 27/07/2005, Henrik Bengtsson wrote: >Recently we detected some problems with internal regexpr libraries in R >v2.1.1. One of the symptoms was that R would crash on Windows, but also >that the regular expression became corrupt in memory. This was partly >fixed in the R v2.1.1 patched (2005-07-20). Note that this was introduced >when the went from R v2.1.0 to v2.1.1, so this might be related to your >problem. > >Cheers > >Henrik > >Naomi Altman wrote: >>There are "\" and "#" before the offending line. I could not find any >>other unusual characters in the offending line. >>--Naomi >>At 09:59 AM 7/26/2005, Sean Davis wrote: >> >>>On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote: >>> >>> >>>>>Date: Mon, 25 Jul 2005 12:22:22 -0400 >>>>>From: Naomi Altman <naomi at="" stat.psu.edu=""> >>>>>Subject: [BioC] problem read.maimage("Agilent") -limma >>>>>To: bioconductor at stat.math.ethz.ch >>>>> >>>>>I am having trouble reading the Agilent arabidopsis 22575 gene array >>>>>using >>>>>read.maimage in Limma under R 2.1.1 (I don't know the limma version, >>>>>but I >>>>>just downloaded using the R packages interface, and also used the >>>>>update, >>>>>so I presume this is the most recent. >>>> >>>>You should have limma 2.0.2. >>>> >>>> >>>>>Under R 2.0.1, there was no problem reading all the data in the >>>>>arrays using: >>>>> >>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt"," 4508. >>>>>txt","4509.txt"),source="agilent" >>>>>) >>>>> >>>>>dim(RGf$R) >>>>>22575 6 >>>>> >>>>> >>>>>But under R 2.1.I, I get: >>>>> >>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt"," 4508. >>>>>txt","4509.txt"),source="agilent" >>>>>) >>>>> >>>>>dim(RGf$R) >>>>>12956 6 >>>>> >>>>>The last line of RGf$R is all NA. >>>>> >>>>>The problem might be in RGf$genes. When I try to print any row up to >>>>>the >>>>>last one, everything looks normal. Trying to print the last row kills >>>>>R. The annotation for this gene appears to be exceptionally long. >>> >>>I have had problems with Agilent annotation files containing "special" >>>characters that cause similar "termination" of file reading. I would >>>look at the annotation for quotation marks, single quotes, # symbols >>>(no idea why this seems to affect things), and backslashes. I >>>typically write a little perl script to "clean" the files. I'm not >>>sure why this should vary from one version to the next, though. >>> >>>Sean >> >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Bioinformatics Consulting Center >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > The gene names in the AgilentFE export files contain strings such as \0, > which is being matched as the null character. This is not only causing the > file read to terminate premately, it is also causing a crash of R itself > when the string is printed. A quick and dirty solution would be to replace the \ with \\ using any text editor and load that into R. It's not very nice, but it should keep things working until a real solution is found. Francois
ADD REPLY
0
Entering edit mode
On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > The gene names in the AgilentFE export files contain strings such as \0, > which is being matched as the null character. This is not only causing the > file read to terminate premately, it is also causing a crash of R itself > when the string is printed. A quick and dirty solution would be to replace the \ with \\ using any text editor and load that into R. It's not very nice, but it should keep things working until a real solution is found. Francois
ADD REPLY
0
Entering edit mode
Another quick and dirty solution is to keep R. 2.0.x around, read the files in and then bring .Rdata up in 2.1.1. Works fine for me. --Naomi At 08:51 PM 7/26/2005, Francois Pepin wrote: >On Wed, 2005-27-07 at 10:29 +1000, Gordon Smyth wrote: > > The gene names in the AgilentFE export files contain strings such as \0, > > which is being matched as the null character. This is not only causing the > > file read to terminate premately, it is also causing a crash of R itself > > when the string is printed. > >A quick and dirty solution would be to replace the \ with \\ using any >text editor and load that into R. > >It's not very nice, but it should keep things working until a real >solution is found. > >Francois > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD REPLY

Login before adding your answer.

Traffic: 624 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6