limma and marray data import problem
1
0
Entering edit mode
@piotr-stepniak-2827
Last seen 10.4 years ago
Hello Everyone, I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc. course at Adam Mickiewicz University in Pozna?, Poland. I am working in Polish Science Academy in microarray experiments group. I'm a newbie in R and BioC, so please forgive me if my question is easy... I'm having problem with data import to RGList or marrayRaw objects. Using the following instruction: bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543 Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"), annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL) The data seems to load, but $genes table looks odd, I guess the column names are shifted right by 1 column: $genes Block Column Row Name ID 1 1 1 ERG_Operon 2078 2647 2 2 1 ERG_Operon 2078 3102 3 3 1 ERG_Operon 2078 3549 4 4 1 FLT3_Operon 2322 3994 5 5 1 FLT3_Operon 2322 4444 2635 more rows ... This I think causes printer layout to be imported wrongly and then any other try to process the data (e.g. quality tests) produce such error message: Error in if is.int(totalPlate)) { : argument is of length zero The data is obtained with ScanArrayExpress software, so I have it in gpr or csv files, both give similar errors, but loading csv files seems also to fail import values for each channel and gets only the file name headers. Marray import also fails, I will skip the info about it not to enlarge the mail unnecessarily. My R session info is as follows: > sessionInfo() R version 2.6.2 (2008-02-08) i486-pc-linux-gnu locale: C attached base packages: [1] grid splines tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0 [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10 [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1 [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1 [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0 [16] Biobase_1.16.3 lattice_0.17-7 loaded via a namespace (and not attached): [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8 [4] annotate_1.18.0 rcompgen_0.1-17 I think I should also say that these data causes import problems to any other data analysis software :( I also tried to read the printer layout from gal file, but all I got was "Block, Row, Column, ID columns not found" error. I'd greatly appreciate any help, please. Yours faithfully, Piotr St?pniak
Microarray PROcess Microarray PROcess • 1.0k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia
Dear Piotr, The file extension "gpr" is short for GenePix Results file. If ScanArray Express outputs a file with this extension, you should have every expectation that is formated exactly the same as a gpr file from GenePix, and therefore you should be able to read it using read.maimages(source="genepix"). If this is not true, then ScanArray is irresponsible to use this extension. Same comments for the GAL file. It is obviously not a GAL file as defined by GenePix, otherwise it would be read using readGAL(). >From your description below, a possible explanation for the problem is that your files have an extra column with no corresponding heading, e.g., a column of row numbers. However no one on this mailing list can tell that for sure without you showing us some lines from your file. Questions: 1. Why have you set row.names=NULL? This prevents R from detecting a column of row numbers. What happens if you remove this? 2. Are these files exactly as output by ScanArray, or have they been further processed? 3. Can you post the first few lines of an example file? Best wishes Gordon PS. You posted the same question to the BioC mailing list on three consecutive days during the weekend. Please post the question just once. > Date: Sat, 31 May 2008 12:55:25 +0200 > From: " Piotr St?pniak " <piotrek.stepniak at="" gmail.com=""> > Subject: [BioC] limma and marray data import problem > To: bioconductor at stat.math.ethz.ch > > Hello Everyone, > > I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc. > course at Adam Mickiewicz University in Pozna?, Poland. I am working > in Polish Science Academy in microarray experiments group. > > I'm a newbie in R and BioC, so please forgive me if my question is easy... > > I'm having problem with data import to RGList or marrayRaw objects. > Using the following instruction: > bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543 > Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"), > annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL) > The data seems to load, but $genes table looks odd, I guess the column > names are shifted right by 1 column: > $genes > Block Column Row Name ID > 1 1 1 ERG_Operon 2078 2647 > 2 2 1 ERG_Operon 2078 3102 > 3 3 1 ERG_Operon 2078 3549 > 4 4 1 FLT3_Operon 2322 3994 > 5 5 1 FLT3_Operon 2322 4444 > 2635 more rows ... > This I think causes printer layout to be imported wrongly and then any > other try to process the data (e.g. quality tests) produce such error > message: > Error in if is.int(totalPlate)) { : argument is of length zero > > The data is obtained with ScanArrayExpress software, so I have it in > gpr or csv files, both give similar errors, but loading csv files > seems also to fail import values for each channel and gets only the > file name headers. > > Marray import also fails, I will skip the info about it not to enlarge > the mail unnecessarily. > > My R session info is as follows: >> sessionInfo() > R version 2.6.2 (2008-02-08) > i486-pc-linux-gnu > > locale: > C > > attached base packages: > [1] grid splines tools stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0 > [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10 > [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1 > [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1 > [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0 > [16] Biobase_1.16.3 lattice_0.17-7 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8 > [4] annotate_1.18.0 rcompgen_0.1-17 > > > I think I should also say that these data causes import problems to > any other data analysis software :( I also tried to read the printer > layout from gal file, but all I got was "Block, Row, Column, ID > columns not found" error. > > I'd greatly appreciate any help, please. > > Yours faithfully, > Piotr St?pniak
ADD COMMENT
0
Entering edit mode
Dear Gordon, Thank you for your reply. I tried using source="genepix", it did not work better than "scanarray". The following commands give: > bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames, : duplicate 'row.names' are not allowed It turnes out the format is not 100% valid GenePix, e.g. it does not have any index column, so I try this: >bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL) Error in RG[[a]][, i] <- obj[, columns[[a]]] : number of items to replace is not a multiple of replacement length In addition: Warning message: In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion I tried different parameter combinations which got me to the command you've seen in the previous messages (I'm sorry for sending it 3 times...). The file is finally read, but wrongly as described earlier. Same happens to gal file: > gal<-readGAL("Bialko.gal") Error in read.table(file = file, header = TRUE, col.names = allcnames, : duplicate 'row.names' are not allowed > gal<-readGAL("Bialko.gal", row.names=NULL) Error in if is.int(totalPlate)) { : argument is of length zero To answer your further questions shortly: 2. Yes, these are the files straight from the scanner software. ScanArrayExpress also offers csv export, but reading them is another problem. They do have Index column, > bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",") reads the file and the values are under correct columns but I get no printer layout read and other function to process the data gives: Error in if is.int(totalPlate)) { : argument is of length zero 3. Yes, I'd be happy to if you please look at it: Beginning of GPR file: ATF 1.0 21 82 "Type=GenePix Results 2" "DateTime=2008/03/28 10:30:03" "Settings=Easy Quant" "GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\BIALACZKI_2_25luty2008_popr.gal" "Scanner=Model: Express Serial No.: 432617" "Comment=<f1>Alexa 555<f2>Alexa 647<f1 offset="">0,0<f2 offset="">0,0<comment>" "PixelSize=10" "Wavelengths=543 nm 633 nm" "ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif" "PMTGain=65 60" "NormalizationMethod=LOWESS" "NormalizationFactors=0.000 0.000" "JpegImage=" "RatioFormulations=W2/W1(633/543)" "Barcode=" "ImageOrigin=1500 11600" "JpegOrigin=0 0" "Creator=ScanArray Express, Microarray Analysis System 3.0.0.16" "Temperature=0.0" "LaserPower=90 90 0 0" "LaserOnTime=0 0 0 0" "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F543 Median" "F543 Mean" "F543 SD" "B543 Median" "B543 Mean" "B543 SD" "% > B543+1SD" "% > B543+2SD" "F543 % Sat." "F633 Median" "F633 Mean" "F633 SD" "B633 Median" "B633 Mean" "B633 SD" "% > B633+1SD" "% > B633+2SD" "F633 % Sat." "F3 Median" "F3 Mean" "F3 SD" "B3 Median" "B3 Mean" "B3 SD" "% > B3+1SD" "% > B3+2SD" "F3 % Sat." "F4 Median" "F4 Mean" "F4 SD" "B4 Median" "B4 Mean" "B4 SD" "% > B4+1SD" "% > B4+2SD" "F4 % Sat." "Ratio of Medians (633/543)" "Ratio of Means (633/543)" "Median of Ratios (633/543)" "Mean of Ratios (633/543)" "Ratios SD (633/543)" "Rgn Ratio (633/543)" "Rgn R? (633/543)" "Ratio of Medians (Ratio/2)" "Ratio of Means (Ratio/2)" "Median of Ratios (Ratio/2)" "Mean of Ratios (Ratio/2)" "Ratios SD (Ratio/2)" "Rgn Ratio (Ratio/2)" "Rgn R? (Ratio/2)" "Ratio of Medians (Ratio/3)" "Ratio of Means (Ratio/3)" "Median of Ratios (Ratio/3)" "Mean of Ratios (Ratio/3)" "Ratios SD (Ratio/3)" "Rgn Ratio (Ratio/3)" "Rgn R? (Ratio/3)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log Ratio (633/543)" "Log Ratio (Ratio/2)" "Log Ratio (Ratio/3)" "F543 Median - B543" "F633 Median - B633" "F3 Median - B3" "F4 Median - B4" "F543 Mean - B543" "F633 Mean - B633" "F3 Mean - B3" "F4 Mean - B4" "Flags" "Normalize" 1 1 1 ERG_Operon 2078 2805 13125 230 5946 6035 1754 2490 2506 529 97 92 0 1604 1636 517 683 698 194 94 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.266 0.269 0.270 0.329 0.329 0.232 0.621 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 384 734 4377 4498 -1.908 0.000 0.000 3456 921 0 0 3545 953 0 0 100 1 1 2 1 ERG_Operon 2078 3250 13128 220 5368 5457 1634 2330 2378 537 96 91 0 1624 1651 531 651 671 188 95 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.320 0.320 0.318 0.567 0.567 0.254 0.608 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 351 858 4011 4127 -1.643 0.000 0.000 3038 973 0 0 3127 1000 0 0 100 1 1 3 1 ERG_Operon 2078 3698 13124 220 4368 4676 1646 2206 2240 490 90 81 0 1476 1562 592 646 673 182 90 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.384 0.371 0.377 0.498 0.498 0.281 0.610 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 348 858 2992 3386 -1.381 0.000 0.000 2162 830 0 0 2470 916 0 0 100 1 And for comparison here is a corresponding csv: BEGIN HEADER PerkinElmer Inc. ScanArrayCSVFileFormat,2.00 ScanArray Express,2.00 Number_of_Columns,62 END HEADER BEGIN GENERAL INFO DateTime,2008/03/28 10:30 GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\BIALACZKI_2_25luty2008_popr.gal Scanner,Model: Express Serial No.: 432617 User Name,Luiza Computer Name, Protocol,Easy Quant Quantitation Method,Adaptive Circle Quality Confidence Calculation,Footprint User comments, Image Origin,1500,11600 Temperature,0 Laser Powers,90,90 Laser On Time,0 PMT Voltages,65,60 END GENERAL INFO BEGIN QUANTITATION PARAMETERS Min Percentile,30 Max Percentile,300 END QUANTITATION PARAMETERS BEGIN QUALITY MEASUREMENTS Max Footprint,100 END QUALITY MEASUREMENTS BEGIN ARRAY PATTERN INFO Units,?m Array Rows,10 Array Columns,4 Spot Rows,9 Spot Columns,9 Array Row Spacing,4500.000000 Array Column Spacing,4500.000000 Spot Row Spacing,450.000000 Spot Column Spacing,450.000000 Spot Diameter,200 Interstitial,0 Spots Per Array,81 Total Spots,2640 END ARRAY PATTERN INFO BEGIN IMAGE INFO ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y Units Per Pixel,X Offset,Y Offset,Status -1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa 555,,?m,10.000000,10.000000,0.000000,0.000000,Control Image -1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa 647,,?m,10.000000,10.000000,0.000000,0.000000, END IMAGE INFO BEGIN NORMALIZATION INFO Normalization Method,LOWESS END NORMALIZATION INFO BEGIN DATA Index,Array Row,Array Column,Spot Row,Spot Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1 Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1 SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1 SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2 Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2 Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios SD,Ch2 Rgn Ratio,Ch2 Rgn R?,Ch2 Log Ratio,Sum of Medians,Sum of Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio 1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,175 4.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27,6 83,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0.23 ,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42,0. 75,0.42,0.92,0.44,-1.257 2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,163 4.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34,65 1,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0.25 ,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51,0. 95,0.50,1.68,0.48,-0.984 3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,164 5.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68,64 6,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.28, 0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.32, 0.13,0.43,0.56,-2.934 Kind Regards, Piotr On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Dear Piotr, > > The file extension "gpr" is short for GenePix Results file. If ScanArray > Express outputs a file with this extension, you should have every > expectation that is formated exactly the same as a gpr file from GenePix, > and therefore you should be able to read it using > read.maimages(source="genepix"). If this is not true, then ScanArray is > irresponsible to use this extension. > > Same comments for the GAL file. It is obviously not a GAL file as defined > by GenePix, otherwise it would be read using readGAL(). > > From your description below, a possible explanation for the problem is that > your files have an extra column with no corresponding heading, e.g., a > column of row numbers. However no one on this mailing list can tell that > for sure without you showing us some lines from your file. > > Questions: > 1. Why have you set row.names=NULL? This prevents R from detecting a column > of row numbers. What happens if you remove this? > > 2. Are these files exactly as output by ScanArray, or have they been further > processed? > > 3. Can you post the first few lines of an example file? > > Best wishes > Gordon > > PS. You posted the same question to the BioC mailing list on three > consecutive days during the weekend. Please post the question just once. > > >> Date: Sat, 31 May 2008 12:55:25 +0200 >> From: " Piotr St?pniak " <piotrek.stepniak at="" gmail.com=""> >> Subject: [BioC] limma and marray data import problem >> To: bioconductor at stat.math.ethz.ch >> >> Hello Everyone, >> >> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc. >> course at Adam Mickiewicz University in Pozna?, Poland. I am working >> in Polish Science Academy in microarray experiments group. >> >> I'm a newbie in R and BioC, so please forgive me if my question is easy... >> >> I'm having problem with data import to RGList or marrayRaw objects. >> Using the following instruction: >> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543 >> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"), >> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL) >> The data seems to load, but $genes table looks odd, I guess the column >> names are shifted right by 1 column: >> $genes >> Block Column Row Name ID >> 1 1 1 ERG_Operon 2078 2647 >> 2 2 1 ERG_Operon 2078 3102 >> 3 3 1 ERG_Operon 2078 3549 >> 4 4 1 FLT3_Operon 2322 3994 >> 5 5 1 FLT3_Operon 2322 4444 >> 2635 more rows ... >> This I think causes printer layout to be imported wrongly and then any >> other try to process the data (e.g. quality tests) produce such error >> message: >> Error in if is.int(totalPlate)) { : argument is of length zero >> >> The data is obtained with ScanArrayExpress software, so I have it in >> gpr or csv files, both give similar errors, but loading csv files >> seems also to fail import values for each channel and gets only the >> file name headers. >> >> Marray import also fails, I will skip the info about it not to enlarge >> the mail unnecessarily. >> >> My R session info is as follows: >>> >>> sessionInfo() >> >> R version 2.6.2 (2008-02-08) >> i486-pc-linux-gnu >> >> locale: >> C >> >> attached base packages: >> [1] grid splines tools stats graphics grDevices utils >> [8] datasets methods base >> >> other attached packages: >> [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0 >> [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10 >> [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1 >> [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1 >> [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0 >> [16] Biobase_1.16.3 lattice_0.17-7 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8 >> [4] annotate_1.18.0 rcompgen_0.1-17 >> >> >> I think I should also say that these data causes import problems to >> any other data analysis software :( I also tried to read the printer >> layout from gal file, but all I got was "Block, Row, Column, ID >> columns not found" error. >> >> I'd greatly appreciate any help, please. >> >> Yours faithfully, >> Piotr St?pniak >
ADD REPLY
0
Entering edit mode
Dear Piotr, I can't diagnose your problem, because the shortened version of your data file that you emailed reads fine for me when I put the lines in a text file, as I show below. I used sep="" in my code because email doesn't preserve tab separators. Presumably the problem appears further into the file, perhaps near the bottom. Or else you file has inconsistent separators. Can you try the arguments nrows=2 and nrows=2640? I would also expect the csv file to read with the following: read.maimages("file.csv",columns=list(G="F543 Median",Gb="B543 Median", R="F633 Median", Rb="B633 Median"),sep=",",nrows=2640) Best wishes Gordon My code: > read.maimages("temp.txt",source="genepix",columns=list(G="F543 Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"),sep="") Read temp.txt An object of class "RGList" $G temp [1,] 5946 [2,] 5368 $Gb temp [1,] 2490 [2,] 2330 $R temp [1,] 1604 [2,] 1624 $Rb temp [1,] 683 [2,] 651 $targets FileName temp temp.txt $genes Block Row Column ID Name 1 1 1 1 2078 ERG_Operon 2 1 1 2 2078 ERG_Operon $source [1] "genepix" $printer $ngrid.r [1] 1 $ngrid.c [1] 1 $nspot.r [1] 1 $nspot.c [1] 2 attr(,"class") [1] "PrintLayout" On Mon, 2 Jun 2008, Piotr St?pniak wrote: > Dear Gordon, > > Thank you for your reply. > > I tried using source="genepix", it did not work better than "scanarray". > The following commands give: > >> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix")Error in read.table(file = file, header = TRUE, col.names = allcnames, : > duplicate 'row.names' are not allowed > > It turnes out the format is not 100% valid GenePix, e.g. it does not > have any index column, so I try this: > >> bialkoRaw<-read.maimages(dir(pattern="gpr"), source="genepix", row.names=NULL) > Error in RG[[a]][, i] <- obj[, columns[[a]]] : > number of items to replace is not a multiple of replacement length > In addition: Warning message: > In getLayout(RG$genes, guessdups = FALSE) : NAs introduced by coercion > > I tried different parameter combinations which got me to the command > you've seen in the previous messages (I'm sorry for sending it 3 > times...). > > The file is finally read, but wrongly as described earlier. > > Same happens to gal file: > >> gal<-readGAL("Bialko.gal") > Error in read.table(file = file, header = TRUE, col.names = allcnames, : > duplicate 'row.names' are not allowed > >> gal<-readGAL("Bialko.gal", row.names=NULL) > Error in if is.int(totalPlate)) { : argument is of length zero > > To answer your further questions shortly: > 2. Yes, these are the files straight from the scanner software. > ScanArrayExpress also offers csv export, but reading them is another > problem. They do have Index column, >> bialkoRaw<- read.maimages( dir(pattern="csv"), columns=list(G="Ch1\ Median", Gb="Ch1\ B\ Median", R="Ch2\ Median", Rb="Ch2\ B\ Median"), sep=",") > reads the file and the values are under correct columns but I get no > printer layout read and other function to process the data gives: > Error in if is.int(totalPlate)) { : argument is of length zero > > 3. Yes, I'd be happy to if you please look at it: > > Beginning of GPR file: > > ATF 1.0 > > 21 82 > > "Type=GenePix Results 2" > > "DateTime=2008/03/28 10:30:03" > > "Settings=Easy Quant" > > "GalFile=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI > RZUT\BIALACZKI_2_25luty2008_popr.gal" > > "Scanner=Model: Express Serial No.: 432617" > > "Comment=<f1>Alexa 555<f2>Alexa 647<f1 offset="">0,0<f2 offset="">0,0<comment>" > > "PixelSize=10" > > "Wavelengths=543 nm 633 nm" > > "ImageFiles=D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI > RZUT\12_03_2008\Skan > Agi\HL60_szk13_PMT65_roz10_Alexa555.tif D:\Luiza\Grant > bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI RZUT\12_03_2008\Skan > Agi\26sz_szk13_PMT60_roz10_Alexa647.tif" > > "PMTGain=65 60" > > "NormalizationMethod=LOWESS" > > "NormalizationFactors=0.000 0.000" > > "JpegImage=" > > "RatioFormulations=W2/W1(633/543)" > > "Barcode=" > > "ImageOrigin=1500 11600" > > "JpegOrigin=0 0" > > "Creator=ScanArray Express, Microarray Analysis System 3.0.0.16" > > "Temperature=0.0" > > "LaserPower=90 90 0 0" > > "LaserOnTime=0 0 0 0" > > "Block" "Column" "Row" "Name" "ID" "X" "Y" "Dia." "F543 Median" "F543 > Mean" "F543 SD" "B543 Median" "B543 Mean" "B543 SD" "% > B543+1SD" "% >> B543+2SD" "F543 % Sat." "F633 Median" "F633 Mean" "F633 SD" "B633 > Median" "B633 Mean" "B633 SD" "% > B633+1SD" "% > B633+2SD" "F633 % > Sat." "F3 Median" "F3 Mean" "F3 SD" "B3 Median" "B3 Mean" "B3 SD" "% > > B3+1SD" "% > B3+2SD" "F3 % Sat." "F4 Median" "F4 Mean" "F4 SD" "B4 > Median" "B4 Mean" "B4 SD" "% > B4+1SD" "% > B4+2SD" "F4 % Sat." "Ratio > of Medians (633/543)" "Ratio of Means (633/543)" "Median of Ratios > (633/543)" "Mean of Ratios (633/543)" "Ratios SD (633/543)" "Rgn Ratio > (633/543)" "Rgn R? (633/543)" "Ratio of Medians (Ratio/2)" "Ratio of > Means (Ratio/2)" "Median of Ratios (Ratio/2)" "Mean of Ratios > (Ratio/2)" "Ratios SD (Ratio/2)" "Rgn Ratio (Ratio/2)" "Rgn R? > (Ratio/2)" "Ratio of Medians (Ratio/3)" "Ratio of Means > (Ratio/3)" "Median of Ratios (Ratio/3)" "Mean of Ratios > (Ratio/3)" "Ratios SD (Ratio/3)" "Rgn Ratio (Ratio/3)" "Rgn R? > (Ratio/3)" "F Pixels" "B Pixels" "Sum of Medians" "Sum of Means" "Log > Ratio (633/543)" "Log Ratio (Ratio/2)" "Log Ratio (Ratio/3)" "F543 > Median - B543" "F633 Median - B633" "F3 Median - B3" "F4 Median - > B4" "F543 Mean - B543" "F633 Mean - B633" "F3 Mean - B3" "F4 Mean - > B4" "Flags" "Normalize" > > 1 1 1 ERG_Operon 2078 2805 13125 230 5946 6035 1754 2490 2506 529 97 92 0 1604 1636 517 683 698 194 94 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.266 0.269 0.270 0.329 0.329 0.232 0.621 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 384 734 4377 4498 -1.908 0.000 0.000 3456 921 0 0 3545 953 0 0 100 1 > > 1 2 1 ERG_Operon 2078 3250 13128 220 5368 5457 1634 2330 2378 537 96 91 0 1624 1651 531 651 671 188 95 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.320 0.320 0.318 0.567 0.567 0.254 0.608 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 351 858 4011 4127 -1.643 0.000 0.000 3038 973 0 0 3127 1000 0 0 100 1 > > 1 3 1 ERG_Operon 2078 3698 13124 220 4368 4676 1646 2206 2240 490 90 81 0 1476 1562 592 646 673 182 90 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.384 0.371 0.377 0.498 0.498 0.281 0.610 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 348 858 2992 3386 -1.381 0.000 0.000 2162 830 0 0 2470 916 0 0 100 1 > > And for comparison here is a corresponding csv: > > BEGIN HEADER > > PerkinElmer Inc. > > ScanArrayCSVFileFormat,2.00 > > ScanArray Express,2.00 > > Number_of_Columns,62 > > END HEADER > > > > BEGIN GENERAL INFO > > DateTime,2008/03/28 10:30 > > GalFile,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI > RZUT\BIALACZKI_2_25luty2008_popr.gal > > Scanner,Model: Express Serial No.: 432617 > > User Name,Luiza > > Computer Name, > > Protocol,Easy Quant > > Quantitation Method,Adaptive Circle > > Quality Confidence Calculation,Footprint > > User comments, > > Image Origin,1500,11600 > > Temperature,0 > > Laser Powers,90,90 > > Laser On Time,0 > > PMT Voltages,65,60 > > END GENERAL INFO > > > > BEGIN QUANTITATION PARAMETERS > > Min Percentile,30 > > Max Percentile,300 > > END QUANTITATION PARAMETERS > > > > BEGIN QUALITY MEASUREMENTS > > Max Footprint,100 > > END QUALITY MEASUREMENTS > > > > BEGIN ARRAY PATTERN INFO > > Units,?m > > Array Rows,10 > > Array Columns,4 > > Spot Rows,9 > > Spot Columns,9 > > Array Row Spacing,4500.000000 > > Array Column Spacing,4500.000000 > > Spot Row Spacing,450.000000 > > Spot Column Spacing,450.000000 > > Spot Diameter,200 > > Interstitial,0 > > Spots Per Array,81 > > Total Spots,2640 > > END ARRAY PATTERN INFO > > > > BEGIN IMAGE INFO > > ImageID,Channel,Image,Fluorophore,Barcode,Units,X Units Per Pixel,Y > Units Per Pixel,X Offset,Y Offset,Status > > -1,CH1,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI > RZUT\12_03_2008\Skan Agi\HL60_szk13_PMT65_roz10_Alexa555.tif,Alexa > 555,,?m,10.000000,10.000000,0.000000,0.000000,Control Image > > -1,CH2,D:\Luiza\Grant bialaczkowy_BADANIA\BIALACZKI_skany\DRUGI > RZUT\12_03_2008\Skan Agi\26sz_szk13_PMT60_roz10_Alexa647.tif,Alexa > 647,,?m,10.000000,10.000000,0.000000,0.000000, > > END IMAGE INFO > > > > BEGIN NORMALIZATION INFO > > Normalization Method,LOWESS > > END NORMALIZATION INFO > > > > BEGIN DATA > > Index,Array Row,Array Column,Spot Row,Spot > Column,Name,ID,X,Y,Diameter,F Pixels,B Pixels,Footprint,Flags,Ch1 > Median,Ch1 Mean,Ch1 SD,Ch1 B Median,Ch1 B Mean,Ch1 B SD,Ch1 % > B + 1 > SD,Ch1 % > B + 2 SD,Ch1 F % Sat.,Ch1 Median - B,Ch1 Mean - B,Ch1 > SignalNoiseRatio,Ch2 Median,Ch2 Mean,Ch2 SD,Ch2 B Median,Ch2 B > Mean,Ch2 B SD,Ch2 % > B + 1 SD,Ch2 % > B + 2 SD,Ch2 F % Sat.,Ch2 > Median - B,Ch2 Mean - B,Ch2 SignalNoiseRatio,Ch2 Ratio of Medians,Ch2 > Ratio of Means,Ch2 Median of Ratios,Ch2 Mean of Ratios,Ch2 Ratios > SD,Ch2 Rgn Ratio,Ch2 Rgn R?,Ch2 Log Ratio,Sum of Medians,Sum of > Means,Ch1 N Median,Ch1 N Mean,Ch1 N (Median-B),Ch1 N (Mean-B),Ch2 N > Median,Ch2 N Mean,Ch2 N (Median-B),Ch2 N (Mean-B),Ch2 N Ratio of > Medians,Ch2 N Ratio of Means,Ch2 N Median of Ratios,Ch2 N Mean of > Ratios,Ch2 N Rgn Ratio,Ch2 N Log Ratio > > 1,1,1,1,1,"ERG_Operon","2078",2805,13125,230,384,734,0,3,5946,6035,1 754.26,2490,2506,529.19,97.4,92.2,0.0,3456,3545,11.24,1604,1636,517.27 ,683,698,194.19,94.3,84.1,0.0,921,953,8.26,0.27,0.27,0.27,0.33,0.39,0. 23,0.62,-1.908,4377,4498,5946,6035,3456,3545,3027,2984,1446,2664,0.42, 0.75,0.42,0.92,0.44,-1.257 > > 2,1,1,1,2,"ERG_Operon","2078",3250,13128,220,351,858,0,3,5368,5457,1 634.22,2330,2378,537.27,96.0,90.9,0.0,3038,3127,9.99,1624,1651,531.34, 651,671,188.42,94.9,88.0,0.0,973,1000,8.62,0.32,0.32,0.32,0.57,2.14,0. 25,0.61,-1.643,4011,4127,5368,5457,3038,3127,3100,3039,1536,2956,0.51, 0.95,0.50,1.68,0.48,-0.984 > > 3,1,1,1,3,"ERG_Operon","2078",3698,13124,220,348,858,0,3,4368,4676,1 645.59,2206,2240,490.01,90.2,81.0,0.0,2162,2470,8.91,1476,1562,591.68, 646,673,182.34,90.2,80.2,0.0,830,916,8.09,0.38,0.37,0.38,0.50,0.92,0.2 8,0.61,-1.381,2992,3386,4368,4676,2162,2470,2947,2941,283,797,0.13,0.3 2,0.13,0.43,0.56,-2.934 > > > Kind Regards, > Piotr > > On Mon, Jun 2, 2008 at 3:57 AM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Dear Piotr, >> >> The file extension "gpr" is short for GenePix Results file. If ScanArray >> Express outputs a file with this extension, you should have every >> expectation that is formated exactly the same as a gpr file from GenePix, >> and therefore you should be able to read it using >> read.maimages(source="genepix"). If this is not true, then ScanArray is >> irresponsible to use this extension. >> >> Same comments for the GAL file. It is obviously not a GAL file as defined >> by GenePix, otherwise it would be read using readGAL(). >> >> From your description below, a possible explanation for the problem is that >> your files have an extra column with no corresponding heading, e.g., a >> column of row numbers. However no one on this mailing list can tell that >> for sure without you showing us some lines from your file. >> >> Questions: >> 1. Why have you set row.names=NULL? This prevents R from detecting a column >> of row numbers. What happens if you remove this? >> >> 2. Are these files exactly as output by ScanArray, or have they been further >> processed? >> >> 3. Can you post the first few lines of an example file? >> >> Best wishes >> Gordon >> >> PS. You posted the same question to the BioC mailing list on three >> consecutive days during the weekend. Please post the question just once. >> >> >>> Date: Sat, 31 May 2008 12:55:25 +0200 >>> From: " Piotr St?pniak " <piotrek.stepniak at="" gmail.com=""> >>> Subject: [BioC] limma and marray data import problem >>> To: bioconductor at stat.math.ethz.ch >>> >>> Hello Everyone, >>> >>> I am Piotr St?pniak, B.Sc. in Biotechnology, currently under M.Sc. >>> course at Adam Mickiewicz University in Pozna?, Poland. I am working >>> in Polish Science Academy in microarray experiments group. >>> >>> I'm a newbie in R and BioC, so please forgive me if my question is easy... >>> >>> I'm having problem with data import to RGList or marrayRaw objects. >>> Using the following instruction: >>> bialkoRaw<- read.maimages( dir(pattern="gpr"), columns=list(G="F543 >>> Median", Gb="B543 Median", R="F633 Median", Rb="B633 Median"), >>> annotation=c("Block", "Column", "Row", "Name", "ID"), row.names=NULL) >>> The data seems to load, but $genes table looks odd, I guess the column >>> names are shifted right by 1 column: >>> $genes >>> Block Column Row Name ID >>> 1 1 1 ERG_Operon 2078 2647 >>> 2 2 1 ERG_Operon 2078 3102 >>> 3 3 1 ERG_Operon 2078 3549 >>> 4 4 1 FLT3_Operon 2322 3994 >>> 5 5 1 FLT3_Operon 2322 4444 >>> 2635 more rows ... >>> This I think causes printer layout to be imported wrongly and then any >>> other try to process the data (e.g. quality tests) produce such error >>> message: >>> Error in if is.int(totalPlate)) { : argument is of length zero >>> >>> The data is obtained with ScanArrayExpress software, so I have it in >>> gpr or csv files, both give similar errors, but loading csv files >>> seems also to fail import values for each channel and gets only the >>> file name headers. >>> >>> Marray import also fails, I will skip the info about it not to enlarge >>> the mail unnecessarily. >>> >>> My R session info is as follows: >>>> >>>> sessionInfo() >>> >>> R version 2.6.2 (2008-02-08) >>> i486-pc-linux-gnu >>> >>> locale: >>> C >>> >>> attached base packages: >>> [1] grid splines tools stats graphics grDevices utils >>> [8] datasets methods base >>> >>> other attached packages: >>> [1] arrayQuality_1.18.0 gridBase_0.4-3 hexbin_1.14.0 >>> [4] convert_1.16.0 RColorBrewer_1.0-2 cluster_1.11.10 >>> [7] arrayMagic_1.16.1 genefilter_1.16.0 survival_2.34-1 >>> [10] marray_1.18.0 vsn_3.6.0 limma_2.14.1 >>> [13] affy_1.16.0 preprocessCore_1.0.0 affyio_1.8.0 >>> [16] Biobase_1.16.3 lattice_0.17-7 >>> >>> loaded via a namespace (and not attached): >>> [1] AnnotationDbi_1.0.6 DBI_0.2-4 RSQLite_0.6-8 >>> [4] annotate_1.18.0 rcompgen_0.1-17 >>> >>> >>> I think I should also say that these data causes import problems to >>> any other data analysis software :( I also tried to read the printer >>> layout from gal file, but all I got was "Block, Row, Column, ID >>> columns not found" error. >>> >>> I'd greatly appreciate any help, please. >>> >>> Yours faithfully, >>> Piotr St?pniak >> >
ADD REPLY

Login before adding your answer.

Traffic: 410 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6