Question

Problems normalizing scanarray express data with limma

0

Entering edit mode

Matthew Ouellette ▴ 10

@matthew-ouellette-5041

Last seen 10.6 years ago

Hello, I'm having trouble analyzing my custom arrays with limma. I've searched the archives and I seem to be running into a similar problem that was previously dealt with here ( https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). I'm also using outputs from a scanarray express, although I've modified my .csv's accordingly and removed the final line of useless data as indicated in the archives. Also, being an R newbie I wasn't sure how to tell R that my data started after some 74 lines of headers (output info from the scanner), so I deleted those headers out as well (and input $printer info manually), leaving only a header for the columns of intensity data. For simplicities sake I've pasted below a shortened session of what I'm trying to do (my apologies for the lengthy e-mail). I appreciate the help and comments. R version 2.14.0 (2011-10-31) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] > setwd("***") > library(limma) > targets<-readTargets() > RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") Read 01-13_B.csv Read 01-13_M.csv Read 01-13_T.csv > RG$printer <-getLayout2("ChinookBOT.gal") > spottypes<-readSpotTypes() > RG$genes$Status<- controlStatus(spottypes, RG) Matching patterns for: Name Found 1116 oligo Found 21 blank Found 15 serial Setting attributes: values Color > show(RG) An object of class "RGList" $G 01-13_B 01-13_M 01-13_T [1,] 102 119 239 [2,] 100 122 339 [3,] 102 135 251 [4,] 90 112 242 [5,] 110 141 239 1147 more rows ... $Gb 01-13_B 01-13_M 01-13_T [1,] 89 94 147 [2,] 88 84 181 [3,] 88 91 161 [4,] 92 90 175 [5,] 86 87 154 1147 more rows ... $R 01-13_B 01-13_M 01-13_T [1,] 120 678 202 [2,] 154 610 312 [3,] 146 614 306 [4,] 108 654 310 [5,] 122 710 291 1147 more rows ... $Rb 01-13_B 01-13_M 01-13_T [1,] 108 119 135 [2,] 109 137 159 [3,] 113 124 169 [4,] 115 124 180 [5,] 119 104 159 1147 more rows ... $targets FileName Cy3 Cy5 1 01-13_B.csv B1 B2 2 01-13_M.csv M1 M2 3 01-13_T.csv T1 T2 $genes Array Row Array Column Spot Row Spot Column Name ID Status 1 1 1 1 1 HEATH049 Gene A4 oligo 2 1 1 1 2 HEATH049 Gene A4 oligo 3 1 1 1 3 HEATH049 Gene A4 oligo 4 1 1 1 4 HEATH113 Gene A8 oligo 5 1 1 1 5 HEATH113 Gene A8 oligo 1147 more rows ... $source [1] "scanarrayexpress" $other $Ch1 SignalNoiseRatio 01-13_B 01-13_M 01-13_T [1,] 3.06 2.55 3.02 [2,] 2.72 3.06 2.35 [3,] 2.68 3.60 3.34 [4,] 2.51 3.12 0.95 [5,] 3.33 3.82 2.66 1147 more rows ... $Ch2 SignalNoiseRatio 01-13_B 01-13_M 01-13_T [1,] 2.31 12.41 2.85 [2,] 2.42 11.82 3.57 [3,] 2.66 11.71 4.14 [4,] 1.75 14.41 0.65 [5,] 2.09 15.90 4.62 1147 more rows ... $printer $ngrid.r [1] 4 $ngrid.c [1] 4 $nspot.r [1] 6 $nspot.c [1] 14 > MA<- normalizeWithinArrays(RG) Error in normalizeWithinArrays(RG) : printer layout information does not match M row dimension -- Matthew Ouellette, M.Sc. Candidate Great Lakes Institute for Environmental Research University of Windsor [[alternative HTML version deleted]]

GUI limma a4 GUI limma a4 • 1.5k views

ADD COMMENT • link updated 13.3 years ago by Gordon Smyth 52k • written 13.3 years ago by Matthew Ouellette ▴ 10

score 0 · Answer 1 · 2012-01-12

Dear Matthew, This question hasn't been asked for many years! It used to be quite a common question, see for example: https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html The problem is not that you have an extra row, but rather than you have too few rows. Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block. So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data. The reason is almost certainly that a number of empty spots have been removed from the files. One easy workaround is simply to do global loess instead of print-tip-loess normalization: MA <- normalizeWithinArrays(RG, method="loess") Another workaround is to make up a block count variable: block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"] and then to use the solution that I suggested back in July 2005. With respect to the deleting of 74 lines of headers and so forth, have you tried simply using RG <-read.maimages(targets, source="scanarrayexpress", sep=",") using your original unedited files? The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you. Best wishes Gordon > Date: Tue, 10 Jan 2012 14:34:53 -0500 > From: Matthew Ouellette <ouellet5 at="" uwindsor.ca=""> > To: bioconductor at r-project.org > Subject: [BioC] Problems normalizing scanarray express data with limma > > Hello, > > I'm having trouble analyzing my custom arrays with limma. I've searched > the archives and I seem to be running into a similar problem that was > previously dealt with here ( > https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). > > I'm also using outputs from a scanarray express, although I've modified my > .csv's accordingly and removed the final line of useless data as indicated > in the archives. Also, being an R newbie I wasn't sure how to tell R that > my data started after some 74 lines of headers (output info from the > scanner), so I deleted those headers out as well (and input $printer info > manually), leaving only a header for the columns of intensity data. For > simplicities sake I've pasted below a shortened session of what I'm trying > to do (my apologies for the lengthy e-mail). I appreciate the help and > comments. > > > > R version 2.14.0 (2011-10-31) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] > >> setwd("***") >> library(limma) >> targets<-readTargets() >> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array > Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), > other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") > Read 01-13_B.csv > Read 01-13_M.csv > Read 01-13_T.csv >> RG$printer <-getLayout2("ChinookBOT.gal") >> spottypes<-readSpotTypes() >> RG$genes$Status<- controlStatus(spottypes, RG) > Matching patterns for: Name > Found 1116 oligo > Found 21 blank > Found 15 serial > Setting attributes: values Color >> show(RG) > An object of class "RGList" > $G > 01-13_B 01-13_M 01-13_T > [1,] 102 119 239 > [2,] 100 122 339 > [3,] 102 135 251 > [4,] 90 112 242 > [5,] 110 141 239 > 1147 more rows ... > > $Gb > 01-13_B 01-13_M 01-13_T > [1,] 89 94 147 > [2,] 88 84 181 > [3,] 88 91 161 > [4,] 92 90 175 > [5,] 86 87 154 > 1147 more rows ... > > $R > 01-13_B 01-13_M 01-13_T > [1,] 120 678 202 > [2,] 154 610 312 > [3,] 146 614 306 > [4,] 108 654 310 > [5,] 122 710 291 > 1147 more rows ... > > $Rb > 01-13_B 01-13_M 01-13_T > [1,] 108 119 135 > [2,] 109 137 159 > [3,] 113 124 169 > [4,] 115 124 180 > [5,] 119 104 159 > 1147 more rows ... > > $targets > FileName Cy3 Cy5 > 1 01-13_B.csv B1 B2 > 2 01-13_M.csv M1 M2 > 3 01-13_T.csv T1 T2 > > $genes > Array Row Array Column Spot Row Spot Column Name ID Status > 1 1 1 1 1 HEATH049 Gene A4 oligo > 2 1 1 1 2 HEATH049 Gene A4 oligo > 3 1 1 1 3 HEATH049 Gene A4 oligo > 4 1 1 1 4 HEATH113 Gene A8 oligo > 5 1 1 1 5 HEATH113 Gene A8 oligo > 1147 more rows ... > > $source > [1] "scanarrayexpress" > > $other > $Ch1 SignalNoiseRatio > 01-13_B 01-13_M 01-13_T > [1,] 3.06 2.55 3.02 > [2,] 2.72 3.06 2.35 > [3,] 2.68 3.60 3.34 > [4,] 2.51 3.12 0.95 > [5,] 3.33 3.82 2.66 > 1147 more rows ... > > $Ch2 SignalNoiseRatio > 01-13_B 01-13_M 01-13_T > [1,] 2.31 12.41 2.85 > [2,] 2.42 11.82 3.57 > [3,] 2.66 11.71 4.14 > [4,] 1.75 14.41 0.65 > [5,] 2.09 15.90 4.62 > 1147 more rows ... > > > $printer > $ngrid.r > [1] 4 > > $ngrid.c > [1] 4 > > $nspot.r > [1] 6 > > $nspot.c > [1] 14 > > >> MA<- normalizeWithinArrays(RG) > Error in normalizeWithinArrays(RG) : > printer layout information does not match M row dimension > > > > -- > Matthew Ouellette, M.Sc. Candidate > Great Lakes Institute for Environmental Research > University of Windsor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

score 0 · Answer 2 · 2012-01-13

Dear Matthew, On Thu, 12 Jan 2012, Matthew Ouellette wrote: > Dear Gordon, > > I feel I should have come to that conclusion myself - you're correct > that there are missing spots. ?However, this is not the result of > software removing blank spots; it is in fact the way we printed the > array. ?Each block consists of 6 rows x 14 columns, but the 6th row > only has spots in columns 1 and 2 (i.e. there are only 2 spots on row > 6 in each block). ?This layout is the result of spotting 384 probes in > the smallest area possible in order to cut down the amount of RT > reagents needed to produce significant results. > > As I mentioned earlier, I am very new to R. ?I am in the process of > attempting to use the block count variable you suggested, however I'm > having difficulties adjusting it to the code you suggested in 2005. > How would I modify the following to fit my particular array? > > for (b in 1:48) { > i <- RG$genes$Block==b > MA2 <- normalizeWithinArrays(RG[i,],method="loess") > if(b==1) > MA <- MA2 > else > MA <- rbind(MA,MA2) > } Replace "48" with "16" and "RG$genes$Block" with "block". > As for the the input files, I have attempted to use: > > RG <-read.maimages(targets, source="scanarrayexpress", sep=",") > > on unedited files, but the following warnings come up (when analyzing > 15 unedited array files this time): > >> RG<-read.maimages(targets, source="scanarrayexpress", sep=",") > Read 01-13_B.csv > Read 01-13_M.csv > Read 01-13_T.csv > Read 01-14_B.csv > Read 01-14_M.csv > Read 01-14_T.csv > Read 01-15_B.csv > Read 01-15_M.csv > Read 01-15_T.csv > Read 01-16_B.csv > Read 01-16_M.csv > Read 01-16_T.csv > Read 01-17_B.csv > Read 01-17_M.csv > Read 01-17_T.csv > There were 45 warnings (use warnings() to see them) >> warnings() > Warning messages: > 1: In grep(a, txt) : input string 1 is invalid in this locale > 2: In grep(a, txt) : input string 1 is invalid in this locale > 3: In grep(a, txt) : input string 1 is invalid in this locale > 4: In grep(a, txt) : input string 1 is invalid in this locale > 5: In grep(a, txt) : input string 1 is invalid in this locale > 6: In grep(a, txt) : input string 1 is invalid in this locale > 7: In grep(a, txt) : input string 1 is invalid in this locale > 8: In grep(a, txt) : input string 1 is invalid in this locale > 9: In grep(a, txt) : input string 1 is invalid in this locale > 10: In grep(a, txt) : input string 1 is invalid in this locale > [... to 45] This is most likely caused by the fact that your copy of R is compiled for a different language than that used by the software used to write your data. Eg., it could be that your R is American English and but the files were written using an extended French alphabet, so your files contain non-english letters. Typing sessionInfo() will reveal the language (locale) your version of R is compiled for. > I have contacted a co-worker about this problem and he claims that he > doesn't get this error when using R in Windows XP (I am currently > using Mac OS X). At first, I thought these errors would skew my > results so I opted to edit the files myself just to get the hang of > limma. Nothing to do with Windows or Mac. Probably doesn't affect your limma results. Best wishes Gordon > I appreciate your help, > > Matthew > > On Wed, Jan 11, 2012 at 11:52 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >> Dear Matthew, >> >> This question hasn't been asked for many years! ?It used to be quite a common question, see for example: >> >> https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html >> >> The problem is not that you have an extra row, but rather than you have too few rows. ?Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block. ?So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data. ?The reason is almost certainly that a number of empty spots have been removed from the files. >> >> One easy workaround is simply to do global loess instead of print- tip-loess normalization: >> >> ?MA <- normalizeWithinArrays(RG, method="loess") >> >> Another workaround is to make up a block count variable: >> >> ?block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"] >> >> and then to use the solution that I suggested back in July 2005. >> >> >> With respect to the deleting of 74 lines of headers and so forth, have you tried simply using >> >> ?RG <-read.maimages(targets, source="scanarrayexpress", sep=",") >> >> using your original unedited files? ?The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you. >> >> Best wishes >> Gordon >> >> >>> Date: Tue, 10 Jan 2012 14:34:53 -0500 >>> From: Matthew Ouellette <ouellet5 at="" uwindsor.ca=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] Problems normalizing scanarray express data with limma >>> >>> Hello, >>> >>> I'm having trouble analyzing my custom arrays with limma. ?I've searched >>> the archives and I seem to be running into a similar problem that was >>> previously dealt with here ( >>> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). >>> >>> I'm also using outputs from a scanarray express, although I've modified my >>> .csv's accordingly and removed the final line of useless data as indicated >>> in the archives. ?Also, being an R newbie I wasn't sure how to tell R that >>> my data started after some 74 lines of headers (output info from the >>> scanner), so I deleted those headers out as well (and input $printer info >>> manually), leaving only a header for the columns of intensity data. ? For >>> simplicities sake I've pasted below a shortened session of what I'm trying >>> to do (my apologies for the lengthy e-mail). ?I appreciate the help and >>> comments. >>> >>> >>> >>> R version 2.14.0 (2011-10-31) >>> Copyright (C) 2011 The R Foundation for Statistical Computing >>> ISBN 3-900051-07-0 >>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] >>> >>>> setwd("***") >>>> library(limma) >>>> targets<-readTargets() >>>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array >>> >>> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), >>> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") >>> Read 01-13_B.csv >>> Read 01-13_M.csv >>> Read 01-13_T.csv >>>> >>>> RG$printer <-getLayout2("ChinookBOT.gal") >>>> spottypes<-readSpotTypes() >>>> RG$genes$Status<- controlStatus(spottypes, RG) >>> >>> Matching patterns for: Name >>> Found 1116 oligo >>> Found 21 blank >>> Found 15 serial >>> Setting attributes: values Color >>>> >>>> show(RG) >>> >>> An object of class "RGList" >>> $G >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 102 ? ? 119 ? ? 239 >>> [2,] ? ? 100 ? ? 122 ? ? 339 >>> [3,] ? ? 102 ? ? 135 ? ? 251 >>> [4,] ? ? ?90 ? ? 112 ? ? 242 >>> [5,] ? ? 110 ? ? 141 ? ? 239 >>> 1147 more rows ... >>> >>> $Gb >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? ?89 ? ? ?94 ? ? 147 >>> [2,] ? ? ?88 ? ? ?84 ? ? 181 >>> [3,] ? ? ?88 ? ? ?91 ? ? 161 >>> [4,] ? ? ?92 ? ? ?90 ? ? 175 >>> [5,] ? ? ?86 ? ? ?87 ? ? 154 >>> 1147 more rows ... >>> >>> $R >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 120 ? ? 678 ? ? 202 >>> [2,] ? ? 154 ? ? 610 ? ? 312 >>> [3,] ? ? 146 ? ? 614 ? ? 306 >>> [4,] ? ? 108 ? ? 654 ? ? 310 >>> [5,] ? ? 122 ? ? 710 ? ? 291 >>> 1147 more rows ... >>> >>> $Rb >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 108 ? ? 119 ? ? 135 >>> [2,] ? ? 109 ? ? 137 ? ? 159 >>> [3,] ? ? 113 ? ? 124 ? ? 169 >>> [4,] ? ? 115 ? ? 124 ? ? 180 >>> [5,] ? ? 119 ? ? 104 ? ? 159 >>> 1147 more rows ... >>> >>> $targets >>> ? ?FileName Cy3 Cy5 >>> 1 01-13_B.csv ?B1 ?B2 >>> 2 01-13_M.csv ?M1 ?M2 >>> 3 01-13_T.csv ?T1 ?T2 >>> >>> $genes >>> ?Array Row Array Column Spot Row Spot Column ? ? Name ? ? ?ID Status >>> 1 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 1 HEATH049 Gene A4 ?oligo >>> 2 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 2 HEATH049 Gene A4 ?oligo >>> 3 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 3 HEATH049 Gene A4 ?oligo >>> 4 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 4 HEATH113 Gene A8 ?oligo >>> 5 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 5 HEATH113 Gene A8 ?oligo >>> 1147 more rows ... >>> >>> $source >>> [1] "scanarrayexpress" >>> >>> $other >>> $Ch1 SignalNoiseRatio >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ?3.06 ? ?2.55 ? ?3.02 >>> [2,] ? ?2.72 ? ?3.06 ? ?2.35 >>> [3,] ? ?2.68 ? ?3.60 ? ?3.34 >>> [4,] ? ?2.51 ? ?3.12 ? ?0.95 >>> [5,] ? ?3.33 ? ?3.82 ? ?2.66 >>> 1147 more rows ... >>> >>> $Ch2 SignalNoiseRatio >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ?2.31 ? 12.41 ? ?2.85 >>> [2,] ? ?2.42 ? 11.82 ? ?3.57 >>> [3,] ? ?2.66 ? 11.71 ? ?4.14 >>> [4,] ? ?1.75 ? 14.41 ? ?0.65 >>> [5,] ? ?2.09 ? 15.90 ? ?4.62 >>> 1147 more rows ... >>> >>> >>> $printer >>> $ngrid.r >>> [1] 4 >>> >>> $ngrid.c >>> [1] 4 >>> >>> $nspot.r >>> [1] 6 >>> >>> $nspot.c >>> [1] 14 >>> >>> >>>> MA<- normalizeWithinArrays(RG) >>> >>> Error in normalizeWithinArrays(RG) : >>> ?printer layout information does not match M row dimension >>> >>> > > > -- > Matthew Ouellette, M.Sc. Candidate > Great Lakes Institute for Environmental Research > University of Windsor > 401 Sunset Ave., Windsor, ON, N9B 3P4 > Phone:?(519) 253-3000, Ext 4248 > Fax:?(519) 971-3616 > Email:?ouellet5 at uwindsor.ca > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}