problem reading genepix files using both marray andlimma functions

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

Ditto. I'd be happy to look over a couple of your gpr files. I have had problems with gpr files before, but have often tracked the problem down to the user: 1) had opened the files in excel and then saved as tab-delimited text 2) had (somehow) put a carriage return in the middle of a line 3) had manually edited some of the gene names in only some of the files (DOH!) Etc Etc Things would be a lot easier if us bioinformaticians didn't have "users" ;-) Mick -----Original Message----- From: Gordon Smyth [mailto:smyth@wehi.edu.au] Sent: 17 August 2004 00:12 To: Bela Tiwari Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] problem reading genepix files using both marray andlimma functions The first things to try are 1. Upgrade to the latest version of limma from http://bioinf.wehi.edu.au/limma 2. Check with your associate that the genepix gpr files really have not been edited. Emphasise that you really need files as they are straight out of GenePix. You should also tell us what versions of R and the packages you are using, type: version packageDescription("limma") and include the output in an email. If you take both steps above and still have a problem, then you can send me a couple of your gpr files. The gpr files should be in a zip or similar file to prevent any further conversions by the mailer. Gordon At 01:37 AM 17/08/2004, Bela Tiwari wrote: >Hello, > > >Last week I was sent GenePix data files from an associate. As far as >I'm aware, these files have not been edited in any way before being >sent to me. > >My aim was to load them up and run some marray and/or limma functions >on the data. > >First I tried to load the files (16 of them) using read.GenePix(), but >this failed with an error: > > Error in "colnames<-"( `*tmp*`, value = fnames) : > length of dimnames [2] not equal to array extent > > >Then I tried loading a file individually using read.GenePix() and that >worked fine, however, subsets of files did not. > >I then read through some of the relevant Bioconductor mailing list >posts that I could find, and decided to try the read.maimages function >as an alternative. > >This I did, only to get errors such as: > > line 35162 did not have 43 elements > > >So, I tried loading the files individually, using read.maimages() to >see if I could track down the "problem" files, and then look at them to >see if there was an issue with certain lines within those files. > >I did this, and found that 5 of my 16 files would not load using >read.maimages and gave errors like the one directly above. > >One file gave a different error: > > "number of items read is not a multiple of the number of columns" > >giving me a total of 6 out of 16 files that won't load using >read.maimages. > >Tackling the latter error first - I looked at the file, and saw an >incomplete line at the bottom of the file. I got rid of that, and tried >to load the file using read.GenePix(). I still received a warning >message about the fact that the number of items read is not a multiple >of the number of columns. I cannot spot the problem in the edited >version of the file. The edited file does, however, now read in without >error using read.maimages(). > > >I then tried loading the files that "failed" with the first error >message above individually with read.GenePix() and this works. > >I did look at some of the files to try and see what the problem was >(ie. whether there was anything obviously strange at the lines >indicated as problems by the read.maimages error message), but I can't >see anything. > > >I then took the "successful" subset of my files ( those I could read in >as individual files using read.maimages), and tried to read those in as >a group. This didn't work either, but the error I got was: > >Error in "[.data.frame"(obj, , columns$Rf) : > undefined columns selected > >So, I specified the columns explicitly in the read.maimages command, >but I still got the same error. > > >Thankfully, a recent posting to the mailing list >(http://files.protsuggest.org/biocond/html/3512.html) mentioned issues >related to this, and Dave Nelson gave a solution that could be >implemented. I did this, and my "successful" files then read in just >fine using this hacked version of read.maimages(). > >I also tried using the read.Genepix() function to read in just the >group of "successful" files and that gives the error: > > Error in "colnames<-"( `*tmp*`, value = fnames) : > length of dimnames [2] not equal to array extent > > >So, overall, my questions are: > >Is there anyone out there who would be willing to scan over one of my >"successful" files and one of my "failed" files and see if they can >spot the problem? The errors suggest that the problem should be easy to >spot...but I can't see it. Even with all the gymnastics related above, >I still have a situation where I have only managed to load about half >of the files I have. > >Is there anyone else who has had these experiences of groups of GenePix >files being so seemingly inconsistent as far as being able to read them >using Bioconductor functions? And if so, do you have any advice on how >too make life as easy as possible? > >Does anyone have any other comments about the internal >workings/assumptions of functions such as read.maimages in comparison >to, say, functions like read.GenePix, and which may be more forgiving, >or have known issues, etc? > > >Sorry this is such a long mail! > > >best wishes, > >Bela Tiwari > >************************* >Dr. Bela Tiwari >Lead Bioinformatician > >CEH Oxford >Mansfield Road >Oxford, OX1 3SR >01865 281975 _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor

limma marray limma marray • 1.4k views

ADD COMMENT • link updated 20.7 years ago by Bela Tiwari ▴ 60 • written 20.7 years ago by michael watson IAH-C ★ 3.4k

0

Entering edit mode

Bela Tiwari ▴ 60

@bela-tiwari-339

Last seen 10.6 years ago

Hello, Thanks for the responses about my email and the offers of help. I have now upgraded to the latest version of limma from the wehi site. I also wrote to the person who provided the gpr files I have, and she says they have not been touched/edited, etc. She copied new versions of the files for me from a CD she burned when she first received the files, and these are identical to the ones I have been using. (Same sizes, same errors.) I tried to load all the gpr files as before, using read.maimages. This caused problems again, though the error message may be a bit more telling this time perhaps (see below). I then tried loading just those files that I had been able to load individually without errors using the older version of read.maimages. - this worked! Hurrah! At least that's now a total of 8 of my files (of a total of 16) that I can now load without problem. Here is the error message that appeared when attempting to load all files: > mynewLimmaData <-read.maimages(gprfiles, source = "genepix") Read A12797013.gpr Error in "[<-"(`*tmp*`, , i, value = as.integer(c(53, 46, 49, 227, 57, : number of items to replace is not a multiple of replacement length Given this, I hope you don't mind if I take up your offer and end you a couple of the gpr files (offlist). I hope you can spot the errors that I am missing. I should, of course, have remembered to find out what version of limma I was already using before I upgraded....oh well. I am pretty sure (from another machine like mine), that this is the correct version info for both R and limma: >version platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 9.0 year 2004 month 04 day 12 language R > packageDescription("limma") Package: limma Version: 1.6.7 Date: 2004/05/14 Title: Linear Models for Microarray Data Author: Gordon Smyth <smyth@wehi.edu.au>, Matt Ritchie <mritchie@wehi.edu.au>, James Wettenhall <wettenhall@wehi.edu.au>, Natalie Thorne <thorne@wehi.edu.au> Maintainer: Gordon Smyth <smyth@wehi.edu.au> Depends: R (>= 1.7.1), MASS, splines, statmod (>= 1.0.6), sma Description: Data analysis, linear models and differential expression for microarray data. License: LGPL URL: http://bioinf.wehi.edu.au/limma/ Packaged: Wed May 26 14:31:46 2004; madman Built: R 1.9.0; ; 2004-07-27 16:54:06; unix cheers, Bela ************************* Dr. Bela Tiwari Lead Bioinformatician CEH Oxford Mansfield Road Oxford, OX1 3SR 01865 281975

ADD COMMENT • link 20.7 years ago Bela Tiwari ▴ 60

0

Entering edit mode

Hi Bela, On Tue, 17 Aug 2004, Bela Tiwari wrote: > I have now upgraded to the latest version of limma from the wehi site. <snip> > > packageDescription("limma") > Package: limma > Version: 1.6.7 > Date: 2004/05/14 This is not the latest version of limma. This is the last Bioconductor Release version from May 14. The WEHI limma webpage: http://bioinf.wehi.edu.au/limma/ says: The current version of LIMMA is 1.7.4 dated 21 July 2004. Maybe you have multiple R library directories, and you have installed the latest limma in one library directory, but are still loading R packages from another library directory. Check your .libPaths() Hope this helps, James

ADD REPLY • link 20.7 years ago James Wettenhall ▴ 1000

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 10.6 years ago

>Both read.maimages() and read.GenePix() are designed to read batches of arrays all corresponding to >the same GAL file. You cannot expect to read in data from different GAL files at one time. I think this is a VERY important message for everyone who is a beginner with bioconductor. All of the data read functions I have come across so far rely on the files being in the same row-order. There is NO cross-referencing of Block/Col/Row or gene names across files; bioconductor simply assumes that row 1 in file 1 corresponds to row 1 in file 2, row2 in file 1 to row 2 in file 2 etc etc... In the vast majority of cases this is fine as most people deal with batches of files from the same array design which have been produced by computer, and so the assumption holds. However, I have come across cases where this has certainly not been true, and I think new users of bioconductor are maybe not aware of it. Mick

ADD COMMENT • link 20.7 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

Bela Tiwari ▴ 60

@bela-tiwari-339

Last seen 10.6 years ago

Thank you to both Gordon and Michael for all their time and comments! Wow, the horrors abound in this set of files! And they also point out the (many) assumptions I have been making about what has been going on to produce these files. For instance, I had assumed that as these files were all part of one experiment, they would all have one gal file...especially as these people are already analysing this data (or so I have been told) in GeneSpring. The rest of this saga should be good..... :-) Its no wonder I didn't find that much out there in the mailing lists about the problem I was seeing...these files could reasonably be called a "fiasco". Thank you for all the pointers. Off I go to right the world. Bela ************************* Dr. Bela Tiwari Lead Bioinformatician CEH Oxford Mansfield Road Oxford, OX1 3SR 01865 281975

ADD COMMENT • link 20.7 years ago Bela Tiwari ▴ 60

Login before adding your answer.