LimmaGUI Spot Quality

0

Entering edit mode

Elizabeth Brooke-Powell ▴ 160

@elizabeth-brooke-powell-838

Last seen 10.6 years ago

Hi there, I have a question about limmaGUI. How are spot quality measures used? I am wondering because I am using a new spot finding package that generates confidence values on a per spot basis. Can these be used when loading the data? How will they be used? Any help is much appreciated, Liz Brooke-Powell Molteno Building Department of Pathology University of Cambridge Tennis Court Road Cambridge, CB2 1QP United Kingdom Website: http://www.path.cam.ac.uk/~toxo/ Tel 01223 33 33 31(office) or 01223 33 33 29 (lab) [[alternative HTML version deleted]]

limmaGUI limmaGUI • 1.7k views

ADD COMMENT • link updated 20.8 years ago by Gordon Smyth 52k • written 20.8 years ago by Elizabeth Brooke-Powell ▴ 160

0

Entering edit mode

James Wettenhall ▴ 1000

@james-wettenhall-153

Last seen 10.6 years ago

Hi Liz, limmaGUI is not as flexible as limma when it comes to spot quality measures for "new spot finding packages". Please tell us the column name(s) from your raw image-analysis results files which you want to use for assessing quality, and if you can explain what the quality indicator in this column means (e.g. high=good, low=bad, ...), that would be even better. Try the limmaGUI spot-quality-weighting option for GenePix. (Even if you don't have any GenePix files, you can just pretend you do have GenePix files in order to see the spot-quality weighting dialog.) You can give different weights to different GenePix flags (for "bad" spots or "not found" spots etc.) Is this the sort of thing you are looking for? The extra quality column(s) are read in when the raw data is read in, and then they are used to form weights in the normalization routines in limma. Type: ?normalizeWithinArrays OR ?wtflags (not as flexible as the limmaGUI GenePix flags dialog) at the R prompt for a bit more information. Regards, James On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote: > Hi there, > > I have a question about limmaGUI. How are spot quality measures used? I am > wondering because I am using a new spot finding package that generates > confidence values on a per spot basis. Can these be used when loading the > data? How will they be used? > > Any help is much appreciated, > > Liz Brooke-Powell > > Molteno Building > Department of Pathology > University of Cambridge > Tennis Court Road > Cambridge, CB2 1QP > United Kingdom > > Website: http://www.path.cam.ac.uk/~toxo/ > Tel 01223 33 33 31(office) or 01223 33 33 29 (lab) > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > -- ---------------------------------------------------------------------- ---- James Wettenhall Tel: (+61 3) 9345 2629 Division of Genetics and Bioinformatics Fax: (+61 3) 9347 0852 The Walter & Eliza Hall Institute E-mail: wettenhall@wehi.edu.au of Medical Research, Mobile: (+61 / 0 ) 438 527 921 1G Royal Parade, Parkville, Vic 3050, Australia http://www.wehi.edu.au

ADD COMMENT • link 20.8 years ago James Wettenhall ▴ 1000

0

Entering edit mode

Hi James, The confidence values are give in numbers as decimals with 1 = 100% confident (e.g. confidence value = 0.78) this is a value determined using Bayesian statistics and is a measure of how confident the package is that the spot it found is real. The package itself (BlueFuse only currently available in the UK) uses a Bayesian model to iteratively find spots looking. I don't know much more as it's protected, and I'm a biologist. Basically I am asking if the model can take account of these numbers and adjust the model appropriately. I am not sure in this case that pretending to have GenePix will work as the numbers are not a simple 0 or 1 (good or bad). If I was to try this, do I need to format the txt file of data to look like a GenePix file? Thanks for you help, Liz -----Original Message----- From: James Wettenhall [mailto:wettenhall@wehi.edu.au] Sent: 02 July 2004 14:02 To: Elizabeth Brooke-Powell Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] LimmaGUI Spot Quality Hi Liz, limmaGUI is not as flexible as limma when it comes to spot quality measures for "new spot finding packages". Please tell us the column name(s) from your raw image-analysis results files which you want to use for assessing quality, and if you can explain what the quality indicator in this column means (e.g. high=good, low=bad, ...), that would be even better. Try the limmaGUI spot-quality-weighting option for GenePix. (Even if you don't have any GenePix files, you can just pretend you do have GenePix files in order to see the spot-quality weighting dialog.) You can give different weights to different GenePix flags (for "bad" spots or "not found" spots etc.) Is this the sort of thing you are looking for? The extra quality column(s) are read in when the raw data is read in, and then they are used to form weights in the normalization routines in limma. Type: ?normalizeWithinArrays OR ?wtflags (not as flexible as the limmaGUI GenePix flags dialog) at the R prompt for a bit more information. Regards, James ---------------------------------------------------------------------- ---- James Wettenhall Tel: (+61 3) 9345 2629 Division of Genetics and Bioinformatics Fax: (+61 3) 9347 0852 The Walter & Eliza Hall Institute E-mail: wettenhall@wehi.edu.au of Medical Research, Mobile: (+61 / 0 ) 438 527 921 1G Royal Parade, Parkville, Vic 3050, Australia http://www.wehi.edu.au

ADD REPLY • link 20.8 years ago Elizabeth Brooke-Powell ▴ 160

0

Entering edit mode

Liz, On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote: > adjust the model appropriately. I am not sure in this case that pretending > to have GenePix will work as the numbers are not a simple 0 or 1 (good or No, sorry I didn't mean to imply that you would be able to just use the GenePix option in limmaGUI as is. I just thought it might by helpful for you to learn how weights can be defined (for _GenePix_ data), based on GenePix spot flags. Notice that the weights we define for the GenePix flags are between 0 and 1, just as your "quality weights" already are. But after we process GenePix data, the number of _different_ values in the weights column would be small, e.g. in this weights vector: (1,1,1,1,0.1,1,1,1,1,1,1,0,1,1,1,1,1,0.1,0.1,1,1,1,1,1), there are only three _different_ weight values (0, 0.1 and 1), whereas for your data, the column of weights (between 0 and 1) could contain lots of different weight values between 0 and 1 for the different genes. I don't think you have told us the column name of this quality weight yet. Maybe you should ask the statistician who designed this quality weighting how he/she intended that it be used in normalization. But it can probably be used directly in limma's normalization, and all you would have to do is tell us the appropriate column names which limma would need to read in for your data (Rf, Rb, Gf, Gb and Spot-Quality-Weighting) and then we can add an option to limma/limmaGUI to allow it to read in the appropriate columns for BlueFuse including the quality weights. There are no plans at the moment to add a custom-dialog to limmaGUI for reading in an arbitrary column of weights from your raw image-analysis files. But if you want to start combining the command-line interface with the GUI interface, you could read the weights into RG$weights in limmaGUIenvironment. Then they would be automatically used for normalization. (1) From the R console : RG <- get("RG",envir=limmaGUIenvironment) names(RG) RG$weights <- ... names(RG) assign("RG",RG,limmaGUIenvironment) OR (2) From the "Evaluate R Code menu: RG$weights <- ... (In case (2), when using the "Evaluate R Code" menu, your R commands are automatically evaluated in limmaGUIenvironment which contains all of your microarray data objects used by limmaGUI.) Regards, James

ADD REPLY • link 20.8 years ago James Wettenhall ▴ 1000

0

Entering edit mode

Sorry James, Here are the columns titles: ROW COL SUBGRIDROW SUBGRIDCOL SPOTNUM BLOCK NAME ID CONFIDENCE FLAG MAN EXCL AMPCH1 AMPCH2 RATIO CH1/CH2 LOG2RATIO CH1/CH2 LOG10RATIO CH1/CH2 RATIO CH2/CH1 LOG2RATIO CH2/CH1 LOG10RATIO CH2/CH1 SUM PELROW PELCOL I have previously used the other function in LimmaGUI and used AMPCH1 and AMPCH2 as the signal channels, there is no background data as the background is taken account of in the model. The column labelled CONFIDENCE is obviously the one in question. Thanks for your help, Liz -----Original Message----- From: James Wettenhall [mailto:wettenhall@wehi.edu.au] Sent: 02 July 2004 16:42 To: Elizabeth Brooke-Powell Cc: bioconductor@stat.math.ethz.ch Subject: RE: [BioC] LimmaGUI Spot Quality Liz, On Fri, 2 Jul 2004, Elizabeth Brooke-Powell wrote: > adjust the model appropriately. I am not sure in this case that pretending > to have GenePix will work as the numbers are not a simple 0 or 1 (good or No, sorry I didn't mean to imply that you would be able to just use the GenePix option in limmaGUI as is. I just thought it might by helpful for you to learn how weights can be defined (for _GenePix_ data), based on GenePix spot flags. Notice that the weights we define for the GenePix flags are between 0 and 1, just as your "quality weights" already are. But after we process GenePix data, the number of _different_ values in the weights column would be small, e.g. in this weights vector: (1,1,1,1,0.1,1,1,1,1,1,1,0,1,1,1,1,1,0.1,0.1,1,1,1,1,1), there are only three _different_ weight values (0, 0.1 and 1), whereas for your data, the column of weights (between 0 and 1) could contain lots of different weight values between 0 and 1 for the different genes. I don't think you have told us the column name of this quality weight yet. Maybe you should ask the statistician who designed this quality weighting how he/she intended that it be used in normalization. But it can probably be used directly in limma's normalization, and all you would have to do is tell us the appropriate column names which limma would need to read in for your data (Rf, Rb, Gf, Gb and Spot-Quality-Weighting) and then we can add an option to limma/limmaGUI to allow it to read in the appropriate columns for BlueFuse including the quality weights. There are no plans at the moment to add a custom-dialog to limmaGUI for reading in an arbitrary column of weights from your raw image-analysis files. But if you want to start combining the command-line interface with the GUI interface, you could read the weights into RG$weights in limmaGUIenvironment. Then they would be automatically used for normalization. (1) From the R console : RG <- get("RG",envir=limmaGUIenvironment) names(RG) RG$weights <- ... names(RG) assign("RG",RG,limmaGUIenvironment) OR (2) From the "Evaluate R Code menu: RG$weights <- ... (In case (2), when using the "Evaluate R Code" menu, your R commands are automatically evaluated in limmaGUIenvironment which contains all of your microarray data objects used by limmaGUI.) Regards, James

ADD REPLY • link 20.8 years ago Elizabeth Brooke-Powell ▴ 160

0

Entering edit mode

At 11:13 PM 2/07/2004, Elizabeth Brooke-Powell wrote: >Hi James, > >The confidence values are give in numbers as decimals with 1 = 100% >confident (e.g. confidence value = 0.78) this is a value determined using >Bayesian statistics and is a measure of how confident the package is that >the spot it found is real. The package itself (BlueFuse only currently >available in the UK) uses a Bayesian model to iteratively find spots >looking. I don't know much more as it's protected, and I'm a biologist. > >Basically I am asking if the model can take account of these numbers and >adjust the model appropriately. The answer is yes, in principle, but not without knowing how BlueFuse's "confidence value" is defined and what it means. Is the confidence value a probability? If so, of what? Is it a weight or an inverse variance? If so, of what? How does "confidence value" interact with the FLAG column included in BlueFuse output? You might not be able to answer these questions yourself but the BlueFuse developers can. I have not been able to find technical information on the BlueFuse www site sufficient to answer these questions. Without knowing anything further, I would be inclined to treat the "confidence values" directly as weights in limma normalization and differential expression analyses. This is simple to do in principle, but it is not clear now to read the data in. The BlueFuse format is different to that of other two color image analysis programs. Is the RATIO column in the BlueFuse output the same as AMPCH1 divided by AMPCH2? If not, what are AMPCH1 and AMPCH2? We need to know this. Gordon > I am not sure in this case that pretending >to have GenePix will work as the numbers are not a simple 0 or 1 (good or >bad). If I was to try this, do I need to format the txt file of data to look >like a GenePix file? > >Thanks for you help, > >Liz ---------------------------------------------------------------------- ----------------- Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3050, Australia Tel: (03) 9345 2326, Fax (03) 9347 0852, Email: smyth@wehi.edu.au, www: http://www.statsci.org

ADD REPLY • link 20.8 years ago Gordon Smyth 52k

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

Graham, Many thanks for this further info. I am taking from your remarks on AmpCh1 and AmpCh2 that we can read columns into R and ignore the various ratio columns as these can be re- computed from AmpCh1 and AmpCh2. You are describing the "confidence estimate" as as an intuitive measure. I understand the need for something intuitive. Unfortunately for use in numerical calculations we need a measure which is quantitatively related to something, e.g., is quantitatively related to the estimated variance of the log-ratio is some way. Gordon At 12:45 AM 4/07/2004, Graham Snudden wrote: >Gordon, > >To pick up the points raised in the mail below. > >1. The confidence estimate to which Liz refers is derived from the posterior >distributions returned by the Bayesian framework that we are using to >estimate the biological signal at each spot location. The underlying >framework is relatively complex and provides a number of metrics relating to >the signal in each channel. In order to simplify these metrics and make them >intuitive to the end user (biologists) we generate a single confidence >estimate. This estimate reflects the distribution of the ratio, i.e. how >confident are we in the value calculated for the ratio. In most cases a >tight signal distribution in each channel will lead to a high confidence >however it is possible that a very broad distribution in one channel - a >weak, or saturated, spot - and a very tight distribution in the other will >also lead to a low confidence. A positive control, with near zero signal in >one channel, will therefore return a low confidence reflecting the high >degree of ambiguity in the actual value returned for the ratio. The >associated confidence flag is derived from the confidence estimate by a >simple lookup table which is under user control. This is described on the >website if you follow the 'colour coded confidence flags' link on the >product page; http://www.cambridgebluegnome.com/products/index.htm. > >2. The AmpCh1 and AmpCh2 columns return our estimate of the total signal in >each channel. Clearly as we are not thresholding out an area of signal we >have no concept of mean or median pixel intensity neither do we need to >perform background subtraction as the amount of signal per spot is returned >by the underlying models independent of any noise processes. The ratio is >the ratio between the two channels. >If you need additional technical information I could put you in touch with >our academic founders out of the signal processing lab here in Cambridge. >Clearly we are using the very different approach to more traditional >threshold/template based solutions however our experience is that the >Bayesian approach offers significant advantages in terms of robustness, >automation, detection, accuracy and, as described above, confidence >estimation. > >Best regards > > >Graham Snudden >VP Engineering >BlueGnome Ltd > > > >-----Original Message----- >From: Gordon Smyth [mailto:smyth@wehi.edu.au] >Sent: 02 July 2004 23:33 >To: Elizabeth Brooke-Powell >Cc: 'James Wettenhall'; bioconductor@stat.math.ethz.ch; >info@cambridgebluegnome.com >Subject: RE: [BioC] LimmaGUI Spot Quality > >At 11:13 PM 2/07/2004, Elizabeth Brooke-Powell wrote: > >Hi James, > > > >The confidence values are give in numbers as decimals with 1 = 100% > >confident (e.g. confidence value = 0.78) this is a value determined using > >Bayesian statistics and is a measure of how confident the package is that > >the spot it found is real. The package itself (BlueFuse only currently > >available in the UK) uses a Bayesian model to iteratively find spots > >looking. I don't know much more as it's protected, and I'm a biologist. > > > >Basically I am asking if the model can take account of these numbers and > >adjust the model appropriately. > >The answer is yes, in principle, but not without knowing how BlueFuse's >"confidence value" is defined and what it means. Is the confidence value a >probability? If so, of what? Is it a weight or an inverse variance? If so, >of what? How does "confidence value" interact with the FLAG column included >in BlueFuse output? You might not be able to answer these questions >yourself but the BlueFuse developers can. I have not been able to find >technical information on the BlueFuse www site sufficient to answer these >questions. > >Without knowing anything further, I would be inclined to treat the >"confidence values" directly as weights in limma normalization and >differential expression analyses. This is simple to do in principle, but it >is not clear now to read the data in. The BlueFuse format is different to >that of other two color image analysis programs. Is the RATIO column in the >BlueFuse output the same as AMPCH1 divided by AMPCH2? If not, what are >AMPCH1 and AMPCH2? We need to know this. > >Gordon > > > I am not sure in this case that pretending > >to have GenePix will work as the numbers are not a simple 0 or 1 (good or > >bad). If I was to try this, do I need to format the txt file of data to >look > >like a GenePix file? > > > >Thanks for you help, > > > >Liz > >--------------------------------------------------------------------- ------- >----------- >Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics, >Walter and Eliza Hall Institute of Medical Research, >1G Royal Parade, Parkville, Vic 3050, Australia >Tel: (03) 9345 2326, Fax (03) 9347 0852, >Email: smyth@wehi.edu.au, www: http://www.statsci.org

ADD COMMENT • link 20.8 years ago Gordon Smyth 52k

Login before adding your answer.