Problems normalizing scanarray express data with limma
2
0
Entering edit mode
@matthew-ouellette-5041
Last seen 10.2 years ago
Hello, I'm having trouble analyzing my custom arrays with limma. I've searched the archives and I seem to be running into a similar problem that was previously dealt with here ( https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). I'm also using outputs from a scanarray express, although I've modified my .csv's accordingly and removed the final line of useless data as indicated in the archives. Also, being an R newbie I wasn't sure how to tell R that my data started after some 74 lines of headers (output info from the scanner), so I deleted those headers out as well (and input $printer info manually), leaving only a header for the columns of intensity data. For simplicities sake I've pasted below a shortened session of what I'm trying to do (my apologies for the lengthy e-mail). I appreciate the help and comments. R version 2.14.0 (2011-10-31) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] > setwd("***") > library(limma) > targets<-readTargets() > RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") Read 01-13_B.csv Read 01-13_M.csv Read 01-13_T.csv > RG$printer <-getLayout2("ChinookBOT.gal") > spottypes<-readSpotTypes() > RG$genes$Status<- controlStatus(spottypes, RG) Matching patterns for: Name Found 1116 oligo Found 21 blank Found 15 serial Setting attributes: values Color > show(RG) An object of class "RGList" $G 01-13_B 01-13_M 01-13_T [1,] 102 119 239 [2,] 100 122 339 [3,] 102 135 251 [4,] 90 112 242 [5,] 110 141 239 1147 more rows ... $Gb 01-13_B 01-13_M 01-13_T [1,] 89 94 147 [2,] 88 84 181 [3,] 88 91 161 [4,] 92 90 175 [5,] 86 87 154 1147 more rows ... $R 01-13_B 01-13_M 01-13_T [1,] 120 678 202 [2,] 154 610 312 [3,] 146 614 306 [4,] 108 654 310 [5,] 122 710 291 1147 more rows ... $Rb 01-13_B 01-13_M 01-13_T [1,] 108 119 135 [2,] 109 137 159 [3,] 113 124 169 [4,] 115 124 180 [5,] 119 104 159 1147 more rows ... $targets FileName Cy3 Cy5 1 01-13_B.csv B1 B2 2 01-13_M.csv M1 M2 3 01-13_T.csv T1 T2 $genes Array Row Array Column Spot Row Spot Column Name ID Status 1 1 1 1 1 HEATH049 Gene A4 oligo 2 1 1 1 2 HEATH049 Gene A4 oligo 3 1 1 1 3 HEATH049 Gene A4 oligo 4 1 1 1 4 HEATH113 Gene A8 oligo 5 1 1 1 5 HEATH113 Gene A8 oligo 1147 more rows ... $source [1] "scanarrayexpress" $other $Ch1 SignalNoiseRatio 01-13_B 01-13_M 01-13_T [1,] 3.06 2.55 3.02 [2,] 2.72 3.06 2.35 [3,] 2.68 3.60 3.34 [4,] 2.51 3.12 0.95 [5,] 3.33 3.82 2.66 1147 more rows ... $Ch2 SignalNoiseRatio 01-13_B 01-13_M 01-13_T [1,] 2.31 12.41 2.85 [2,] 2.42 11.82 3.57 [3,] 2.66 11.71 4.14 [4,] 1.75 14.41 0.65 [5,] 2.09 15.90 4.62 1147 more rows ... $printer $ngrid.r [1] 4 $ngrid.c [1] 4 $nspot.r [1] 6 $nspot.c [1] 14 > MA<- normalizeWithinArrays(RG) Error in normalizeWithinArrays(RG) : printer layout information does not match M row dimension -- Matthew Ouellette, M.Sc. Candidate Great Lakes Institute for Environmental Research University of Windsor [[alternative HTML version deleted]]
GUI limma a4 GUI limma a4 • 1.4k views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 48 minutes ago
WEHI, Melbourne, Australia
Dear Matthew, This question hasn't been asked for many years! It used to be quite a common question, see for example: https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html The problem is not that you have an extra row, but rather than you have too few rows. Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block. So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data. The reason is almost certainly that a number of empty spots have been removed from the files. One easy workaround is simply to do global loess instead of print-tip-loess normalization: MA <- normalizeWithinArrays(RG, method="loess") Another workaround is to make up a block count variable: block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"] and then to use the solution that I suggested back in July 2005. With respect to the deleting of 74 lines of headers and so forth, have you tried simply using RG <-read.maimages(targets, source="scanarrayexpress", sep=",") using your original unedited files? The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you. Best wishes Gordon > Date: Tue, 10 Jan 2012 14:34:53 -0500 > From: Matthew Ouellette <ouellet5 at="" uwindsor.ca=""> > To: bioconductor at r-project.org > Subject: [BioC] Problems normalizing scanarray express data with limma > > Hello, > > I'm having trouble analyzing my custom arrays with limma. I've searched > the archives and I seem to be running into a similar problem that was > previously dealt with here ( > https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). > > I'm also using outputs from a scanarray express, although I've modified my > .csv's accordingly and removed the final line of useless data as indicated > in the archives. Also, being an R newbie I wasn't sure how to tell R that > my data started after some 74 lines of headers (output info from the > scanner), so I deleted those headers out as well (and input $printer info > manually), leaving only a header for the columns of intensity data. For > simplicities sake I've pasted below a shortened session of what I'm trying > to do (my apologies for the lengthy e-mail). I appreciate the help and > comments. > > > > R version 2.14.0 (2011-10-31) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] > >> setwd("***") >> library(limma) >> targets<-readTargets() >> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array > Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), > other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") > Read 01-13_B.csv > Read 01-13_M.csv > Read 01-13_T.csv >> RG$printer <-getLayout2("ChinookBOT.gal") >> spottypes<-readSpotTypes() >> RG$genes$Status<- controlStatus(spottypes, RG) > Matching patterns for: Name > Found 1116 oligo > Found 21 blank > Found 15 serial > Setting attributes: values Color >> show(RG) > An object of class "RGList" > $G > 01-13_B 01-13_M 01-13_T > [1,] 102 119 239 > [2,] 100 122 339 > [3,] 102 135 251 > [4,] 90 112 242 > [5,] 110 141 239 > 1147 more rows ... > > $Gb > 01-13_B 01-13_M 01-13_T > [1,] 89 94 147 > [2,] 88 84 181 > [3,] 88 91 161 > [4,] 92 90 175 > [5,] 86 87 154 > 1147 more rows ... > > $R > 01-13_B 01-13_M 01-13_T > [1,] 120 678 202 > [2,] 154 610 312 > [3,] 146 614 306 > [4,] 108 654 310 > [5,] 122 710 291 > 1147 more rows ... > > $Rb > 01-13_B 01-13_M 01-13_T > [1,] 108 119 135 > [2,] 109 137 159 > [3,] 113 124 169 > [4,] 115 124 180 > [5,] 119 104 159 > 1147 more rows ... > > $targets > FileName Cy3 Cy5 > 1 01-13_B.csv B1 B2 > 2 01-13_M.csv M1 M2 > 3 01-13_T.csv T1 T2 > > $genes > Array Row Array Column Spot Row Spot Column Name ID Status > 1 1 1 1 1 HEATH049 Gene A4 oligo > 2 1 1 1 2 HEATH049 Gene A4 oligo > 3 1 1 1 3 HEATH049 Gene A4 oligo > 4 1 1 1 4 HEATH113 Gene A8 oligo > 5 1 1 1 5 HEATH113 Gene A8 oligo > 1147 more rows ... > > $source > [1] "scanarrayexpress" > > $other > $Ch1 SignalNoiseRatio > 01-13_B 01-13_M 01-13_T > [1,] 3.06 2.55 3.02 > [2,] 2.72 3.06 2.35 > [3,] 2.68 3.60 3.34 > [4,] 2.51 3.12 0.95 > [5,] 3.33 3.82 2.66 > 1147 more rows ... > > $Ch2 SignalNoiseRatio > 01-13_B 01-13_M 01-13_T > [1,] 2.31 12.41 2.85 > [2,] 2.42 11.82 3.57 > [3,] 2.66 11.71 4.14 > [4,] 1.75 14.41 0.65 > [5,] 2.09 15.90 4.62 > 1147 more rows ... > > > $printer > $ngrid.r > [1] 4 > > $ngrid.c > [1] 4 > > $nspot.r > [1] 6 > > $nspot.c > [1] 14 > > >> MA<- normalizeWithinArrays(RG) > Error in normalizeWithinArrays(RG) : > printer layout information does not match M row dimension > > > > -- > Matthew Ouellette, M.Sc. Candidate > Great Lakes Institute for Environmental Research > University of Windsor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 48 minutes ago
WEHI, Melbourne, Australia
Dear Matthew, On Thu, 12 Jan 2012, Matthew Ouellette wrote: > Dear Gordon, > > I feel I should have come to that conclusion myself - you're correct > that there are missing spots. ?However, this is not the result of > software removing blank spots; it is in fact the way we printed the > array. ?Each block consists of 6 rows x 14 columns, but the 6th row > only has spots in columns 1 and 2 (i.e. there are only 2 spots on row > 6 in each block). ?This layout is the result of spotting 384 probes in > the smallest area possible in order to cut down the amount of RT > reagents needed to produce significant results. > > As I mentioned earlier, I am very new to R. ?I am in the process of > attempting to use the block count variable you suggested, however I'm > having difficulties adjusting it to the code you suggested in 2005. > How would I modify the following to fit my particular array? > > for (b in 1:48) { > i <- RG$genes$Block==b > MA2 <- normalizeWithinArrays(RG[i,],method="loess") > if(b==1) > MA <- MA2 > else > MA <- rbind(MA,MA2) > } Replace "48" with "16" and "RG$genes$Block" with "block". > As for the the input files, I have attempted to use: > > RG <-read.maimages(targets, source="scanarrayexpress", sep=",") > > on unedited files, but the following warnings come up (when analyzing > 15 unedited array files this time): > >> RG<-read.maimages(targets, source="scanarrayexpress", sep=",") > Read 01-13_B.csv > Read 01-13_M.csv > Read 01-13_T.csv > Read 01-14_B.csv > Read 01-14_M.csv > Read 01-14_T.csv > Read 01-15_B.csv > Read 01-15_M.csv > Read 01-15_T.csv > Read 01-16_B.csv > Read 01-16_M.csv > Read 01-16_T.csv > Read 01-17_B.csv > Read 01-17_M.csv > Read 01-17_T.csv > There were 45 warnings (use warnings() to see them) >> warnings() > Warning messages: > 1: In grep(a, txt) : input string 1 is invalid in this locale > 2: In grep(a, txt) : input string 1 is invalid in this locale > 3: In grep(a, txt) : input string 1 is invalid in this locale > 4: In grep(a, txt) : input string 1 is invalid in this locale > 5: In grep(a, txt) : input string 1 is invalid in this locale > 6: In grep(a, txt) : input string 1 is invalid in this locale > 7: In grep(a, txt) : input string 1 is invalid in this locale > 8: In grep(a, txt) : input string 1 is invalid in this locale > 9: In grep(a, txt) : input string 1 is invalid in this locale > 10: In grep(a, txt) : input string 1 is invalid in this locale > [... to 45] This is most likely caused by the fact that your copy of R is compiled for a different language than that used by the software used to write your data. Eg., it could be that your R is American English and but the files were written using an extended French alphabet, so your files contain non-english letters. Typing sessionInfo() will reveal the language (locale) your version of R is compiled for. > I have contacted a co-worker about this problem and he claims that he > doesn't get this error when using R in Windows XP (I am currently > using Mac OS X). At first, I thought these errors would skew my > results so I opted to edit the files myself just to get the hang of > limma. Nothing to do with Windows or Mac. Probably doesn't affect your limma results. Best wishes Gordon > I appreciate your help, > > Matthew > > On Wed, Jan 11, 2012 at 11:52 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> >> Dear Matthew, >> >> This question hasn't been asked for many years! ?It used to be quite a common question, see for example: >> >> https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html >> >> The problem is not that you have an extra row, but rather than you have too few rows. ?Your arrays have 16 blocks (4 x 4) with 6 rows and 14 columns of spots in each block. ?So limma assumes your arrays to have 4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of data. ?The reason is almost certainly that a number of empty spots have been removed from the files. >> >> One easy workaround is simply to do global loess instead of print- tip-loess normalization: >> >> ?MA <- normalizeWithinArrays(RG, method="loess") >> >> Another workaround is to make up a block count variable: >> >> ?block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"] >> >> and then to use the solution that I suggested back in July 2005. >> >> >> With respect to the deleting of 74 lines of headers and so forth, have you tried simply using >> >> ?RG <-read.maimages(targets, source="scanarrayexpress", sep=",") >> >> using your original unedited files? ?The whole reason for having a "scanarrayexpress" method for read.maimages() is that it takes care of all the editing and reading for you. >> >> Best wishes >> Gordon >> >> >>> Date: Tue, 10 Jan 2012 14:34:53 -0500 >>> From: Matthew Ouellette <ouellet5 at="" uwindsor.ca=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] Problems normalizing scanarray express data with limma >>> >>> Hello, >>> >>> I'm having trouble analyzing my custom arrays with limma. ?I've searched >>> the archives and I seem to be running into a similar problem that was >>> previously dealt with here ( >>> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html). >>> >>> I'm also using outputs from a scanarray express, although I've modified my >>> .csv's accordingly and removed the final line of useless data as indicated >>> in the archives. ?Also, being an R newbie I wasn't sure how to tell R that >>> my data started after some 74 lines of headers (output info from the >>> scanner), so I deleted those headers out as well (and input $printer info >>> manually), leaving only a header for the columns of intensity data. ? For >>> simplicities sake I've pasted below a shortened session of what I'm trying >>> to do (my apologies for the lengthy e-mail). ?I appreciate the help and >>> comments. >>> >>> >>> >>> R version 2.14.0 (2011-10-31) >>> Copyright (C) 2011 The R Foundation for Statistical Computing >>> ISBN 3-900051-07-0 >>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0] >>> >>>> setwd("***") >>>> library(limma) >>>> targets<-readTargets() >>>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array >>> >>> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"), >>> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",") >>> Read 01-13_B.csv >>> Read 01-13_M.csv >>> Read 01-13_T.csv >>>> >>>> RG$printer <-getLayout2("ChinookBOT.gal") >>>> spottypes<-readSpotTypes() >>>> RG$genes$Status<- controlStatus(spottypes, RG) >>> >>> Matching patterns for: Name >>> Found 1116 oligo >>> Found 21 blank >>> Found 15 serial >>> Setting attributes: values Color >>>> >>>> show(RG) >>> >>> An object of class "RGList" >>> $G >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 102 ? ? 119 ? ? 239 >>> [2,] ? ? 100 ? ? 122 ? ? 339 >>> [3,] ? ? 102 ? ? 135 ? ? 251 >>> [4,] ? ? ?90 ? ? 112 ? ? 242 >>> [5,] ? ? 110 ? ? 141 ? ? 239 >>> 1147 more rows ... >>> >>> $Gb >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? ?89 ? ? ?94 ? ? 147 >>> [2,] ? ? ?88 ? ? ?84 ? ? 181 >>> [3,] ? ? ?88 ? ? ?91 ? ? 161 >>> [4,] ? ? ?92 ? ? ?90 ? ? 175 >>> [5,] ? ? ?86 ? ? ?87 ? ? 154 >>> 1147 more rows ... >>> >>> $R >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 120 ? ? 678 ? ? 202 >>> [2,] ? ? 154 ? ? 610 ? ? 312 >>> [3,] ? ? 146 ? ? 614 ? ? 306 >>> [4,] ? ? 108 ? ? 654 ? ? 310 >>> [5,] ? ? 122 ? ? 710 ? ? 291 >>> 1147 more rows ... >>> >>> $Rb >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ? 108 ? ? 119 ? ? 135 >>> [2,] ? ? 109 ? ? 137 ? ? 159 >>> [3,] ? ? 113 ? ? 124 ? ? 169 >>> [4,] ? ? 115 ? ? 124 ? ? 180 >>> [5,] ? ? 119 ? ? 104 ? ? 159 >>> 1147 more rows ... >>> >>> $targets >>> ? ?FileName Cy3 Cy5 >>> 1 01-13_B.csv ?B1 ?B2 >>> 2 01-13_M.csv ?M1 ?M2 >>> 3 01-13_T.csv ?T1 ?T2 >>> >>> $genes >>> ?Array Row Array Column Spot Row Spot Column ? ? Name ? ? ?ID Status >>> 1 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 1 HEATH049 Gene A4 ?oligo >>> 2 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 2 HEATH049 Gene A4 ?oligo >>> 3 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 3 HEATH049 Gene A4 ?oligo >>> 4 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 4 HEATH113 Gene A8 ?oligo >>> 5 ? ? ? ? 1 ? ? ? ? ? ?1 ? ? ? ?1 ? ? ? ? ? 5 HEATH113 Gene A8 ?oligo >>> 1147 more rows ... >>> >>> $source >>> [1] "scanarrayexpress" >>> >>> $other >>> $Ch1 SignalNoiseRatio >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ?3.06 ? ?2.55 ? ?3.02 >>> [2,] ? ?2.72 ? ?3.06 ? ?2.35 >>> [3,] ? ?2.68 ? ?3.60 ? ?3.34 >>> [4,] ? ?2.51 ? ?3.12 ? ?0.95 >>> [5,] ? ?3.33 ? ?3.82 ? ?2.66 >>> 1147 more rows ... >>> >>> $Ch2 SignalNoiseRatio >>> ? ?01-13_B 01-13_M 01-13_T >>> [1,] ? ?2.31 ? 12.41 ? ?2.85 >>> [2,] ? ?2.42 ? 11.82 ? ?3.57 >>> [3,] ? ?2.66 ? 11.71 ? ?4.14 >>> [4,] ? ?1.75 ? 14.41 ? ?0.65 >>> [5,] ? ?2.09 ? 15.90 ? ?4.62 >>> 1147 more rows ... >>> >>> >>> $printer >>> $ngrid.r >>> [1] 4 >>> >>> $ngrid.c >>> [1] 4 >>> >>> $nspot.r >>> [1] 6 >>> >>> $nspot.c >>> [1] 14 >>> >>> >>>> MA<- normalizeWithinArrays(RG) >>> >>> Error in normalizeWithinArrays(RG) : >>> ?printer layout information does not match M row dimension >>> >>> > > > -- > Matthew Ouellette, M.Sc. Candidate > Great Lakes Institute for Environmental Research > University of Windsor > 401 Sunset Ave., Windsor, ON, N9B 3P4 > Phone:?(519) 253-3000, Ext 4248 > Fax:?(519) 971-3616 > Email:?ouellet5 at uwindsor.ca > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:5}}
ADD COMMENT

Login before adding your answer.

Traffic: 654 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6