...another question about using weights on microarray analysis

0

Entering edit mode

Erika Melissari ▴ 250

@erika-melissari-2798

Last seen 10.6 years ago

Hello all, I have found discordant opinions among Bioconductor email regarding the use of quality weights on microarray analysis and I woul like to understand with clarity what to do before starting the statistical analysis of my last experiment. I use LIMMA to perform statistical analysis of microarray experiments. Usually, I assign a weight to all the spots of my experiment by using in read.maimages() this wt.fun: function(x, threshold=3){ #to exclude spots with SNR<3 on both channels snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); #to include only genes and not control spots (I use Agilent microarrays) spotok <- (x[,"ControlType"] == "false"); #to exclude spots with flag "bad" by GenePix Pro 6 flagok <- (x[,"Flags"] >= 0); #to exclude spot saturated satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); spot <- (snrok & spotok & flagok & satok); as.numeric (spot); } In my opinion it is right to exclude spot saturated (because its intensity value is not reliable). Is it wrong? I have a doubt about excluding spot with low SNR, because in my last experiment I should exclude for low SNR about 60% of 45000 spots and I am worried about the robustness of statistical analysis evalued only on 40% of the genes. Should I exclude this spot? Before or after normalization? Should I normalize all the spots and then, on the normalized value, apply the SNR quality filter to exclude normalized spots with low SNR from subsequent statistical analysis? I would like to use arrayWeights() from LIMMA and combine spot quality weights and array quality weights. Is it right to multiply the spot weight matrix by array quality vector? thank you very much for any help on this complicate question. Erika [[alternative HTML version deleted]]

Microarray limma ASSIGN Microarray limma ASSIGN • 2.3k views

ADD COMMENT • link 16.2 years ago • updated 16.1 years ago Erika Melissari ▴ 250

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 3 months ago

United States

Hi Erika, Filtering spots on each array individually has been addressed several times on the list, and the general consensus is to only do it in very rare circumstances, such as when you have manually flagged spots that are scratches, dust spots, e.g., where the reported value has ABSOLUTELY NO RELATIONSHIP to whatever the real value might have been. Spots with low SNR, auto-flagged by GenePix as "missing", or saturated spots all have values that are approximations of what the real value is, even if they aren't as precise because they are outside the measurement abilities of the scan. As I tell my students - zeros are REAL data points - would you throw them out in other scientific measurements? NO. It is fine to throw out a spot that fails to meet your criteria on ALL arrays, like the control spots. I'm not sure about the array quality weights... the example uses 10 replicates per group, which is probably a fine number to use to determine which arrays aren't as much alike as the others, but I'm not sure if it's good to use when you only have 3 replicates. Anyone care to comment on this? Cheers, Jenny At 05:49 AM 2/17/2009, Erika Melissari wrote: >Hello all, > >I have found discordant opinions among Bioconductor email regarding >the use of quality weights on microarray analysis and I woul like to >understand with clarity what to do before starting the statistical >analysis of my last experiment. >I use LIMMA to perform statistical analysis of microarray experiments. >Usually, I assign a weight to all the spots of my experiment by >using in read.maimages() this wt.fun: > >function(x, threshold=3){ > >#to exclude spots with SNR<3 on both channels >snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); > >#to include only genes and not control spots (I use Agilent microarrays) >spotok <- (x[,"ControlType"] == "false"); > >#to exclude spots with flag "bad" by GenePix Pro 6 >flagok <- (x[,"Flags"] >= 0); > >#to exclude spot saturated >satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); > >spot <- (snrok & spotok & flagok & satok); >as.numeric (spot); >} > >In my opinion it is right to exclude spot saturated (because its >intensity value is not reliable). Is it wrong? >I have a doubt about excluding spot with low SNR, because in my last >experiment I should exclude for low SNR about 60% of 45000 spots and >I am worried about the robustness of statistical analysis evalued >only on 40% of the genes. >Should I exclude this spot? >Before or after normalization? >Should I normalize all the spots and then, on the normalized value, >apply the SNR quality filter to exclude normalized spots with low >SNR from subsequent statistical analysis? >I would like to use arrayWeights() from LIMMA and combine spot >quality weights and array quality weights. Is it right to multiply >the spot weight matrix by array quality vector? > >thank you very much for any help on this complicate question. > >Erika > > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD COMMENT • link 16.2 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Dear Jenny and Erika, Regarding the question on array weights, so long as you have enough arrays to fit the linear model to the means, you will be able to estimate array variance factors using arrayWeights(). The example given on the arrayWeights help page uses the methodology on a set of 6 arrays with 3 replicates per group. And yes, spot and array weights can be combined in the analysis by multiplying them together. Make sure that what you are multiplying are matrices of the same dimension though - the output of arrayWeights is a vector, so you will want to run asMatrixWeights() on this vector before multiplying with the spot weights. Best wishes, Matt >Hi Erika, > >Filtering spots on each array individually has been addressed several >times on the list, and the general consensus is to only do it in very >rare circumstances, such as when you have manually flagged spots that >are scratches, dust spots, e.g., where the reported value has >ABSOLUTELY NO RELATIONSHIP to whatever the real value might have >been. Spots with low SNR, auto-flagged by GenePix as "missing", or >saturated spots all have values that are approximations of what the >real value is, even if they aren't as precise because they are >outside the measurement abilities of the scan. As I tell my students >- zeros are REAL data points - would you throw them out in other >scientific measurements? NO. It is fine to throw out a spot that >fails to meet your criteria on ALL arrays, like the control spots. > >I'm not sure about the array quality weights... the example uses 10 >replicates per group, which is probably a fine number to use to >determine which arrays aren't as much alike as the others, but I'm >not sure if it's good to use when you only have 3 replicates. Anyone >care to comment on this? > >Cheers, >Jenny > >At 05:49 AM 2/17/2009, Erika Melissari wrote: >>Hello all, >> >>I have found discordant opinions among Bioconductor email regarding >>the use of quality weights on microarray analysis and I woul like to >>understand with clarity what to do before starting the statistical >>analysis of my last experiment. >>I use LIMMA to perform statistical analysis of microarray experiments. >>Usually, I assign a weight to all the spots of my experiment by >>using in read.maimages() this wt.fun: >> >>function(x, threshold=3){ >> >>#to exclude spots with SNR<3 on both channels >>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); >> >>#to include only genes and not control spots (I use Agilent microarrays) >>spotok <- (x[,"ControlType"] == "false"); >> >>#to exclude spots with flag "bad" by GenePix Pro 6 >>flagok <- (x[,"Flags"] >= 0); >> >>#to exclude spot saturated >>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); >> >>spot <- (snrok & spotok & flagok & satok); >>as.numeric (spot); >>} >> >>In my opinion it is right to exclude spot saturated (because its >>intensity value is not reliable). Is it wrong? >>I have a doubt about excluding spot with low SNR, because in my last >>experiment I should exclude for low SNR about 60% of 45000 spots and >>I am worried about the robustness of statistical analysis evalued >>only on 40% of the genes. >>Should I exclude this spot? >>Before or after normalization? >>Should I normalize all the spots and then, on the normalized value, >>apply the SNR quality filter to exclude normalized spots with low >>SNR from subsequent statistical analysis? >>I would like to use arrayWeights() from LIMMA and combine spot >>quality weights and array quality weights. Is it right to multiply >>the spot weight matrix by array quality vector? >> >>thank you very much for any help on this complicate question. >> >>Erika

ADD REPLY • link 16.1 years ago Matt Ritchie ▴ 50

0

Entering edit mode

Hi Matt, While it's good to know that array weights *can* be fit as long as you can fit the linear model, I was wondering if they *should* be fit if you only have 3 replicates. Back when I did genetics and we could get 10-20 replicates per group, I routinely used a robust fitting of the linear model, in order to minimize the effects of outliers. However, when there are only a few replicates, how can you tell what is an outlier and what is normal variation? I may be wrong, but it seems like your method weights arrays based on how well they fit your specified model, so of course it will improve the model fit. But is this a good idea with only a few replicates? I tried it once, and one of the three replicates was weighted very high, one was lowish, and one weighted close to zero. Therefore, the estimate for the coefficient of that group were almost completely due to one replicate, which is why I decided not to use array weights. Am I off base here? Thanks, Jenny At 10:12 PM 2/18/2009, Matt Ritchie wrote: >Dear Jenny and Erika, > >Regarding the question on array weights, so long as you have enough >arrays to fit the linear model to the means, you will be able to >estimate array variance factors using arrayWeights(). The example >given on the arrayWeights help page uses the methodology on a set of >6 arrays with 3 replicates per group. > >And yes, spot and array weights can be combined in the analysis by >multiplying them together. Make sure that what you are multiplying >are matrices of the same dimension though - the output of >arrayWeights is a vector, so you will want to run asMatrixWeights() >on this vector before multiplying with the spot weights. > >Best wishes, > >Matt > >>Hi Erika, >> >>Filtering spots on each array individually has been addressed >>several times on the list, and the general consensus is to only do >>it in very rare circumstances, such as when you have manually >>flagged spots that are scratches, dust spots, e.g., where the >>reported value has ABSOLUTELY NO RELATIONSHIP to whatever the real >>value might have been. Spots with low SNR, auto-flagged by GenePix >>as "missing", or saturated spots all have values that are >>approximations of what the real value is, even if they aren't as >>precise because they are outside the measurement abilities of the >>scan. As I tell my students - zeros are REAL data points - would >>you throw them out in other scientific measurements? NO. It is fine >>to throw out a spot that fails to meet your criteria on ALL arrays, >>like the control spots. >> >>I'm not sure about the array quality weights... the example uses 10 >>replicates per group, which is probably a fine number to use to >>determine which arrays aren't as much alike as the others, but I'm >>not sure if it's good to use when you only have 3 replicates. >>Anyone care to comment on this? >> >>Cheers, >>Jenny >> >>At 05:49 AM 2/17/2009, Erika Melissari wrote: >>>Hello all, >>> >>>I have found discordant opinions among Bioconductor email >>>regarding the use of quality weights on microarray analysis and I >>>woul like to understand with clarity what to do before starting >>>the statistical analysis of my last experiment. >>>I use LIMMA to perform statistical analysis of microarray experiments. >>>Usually, I assign a weight to all the spots of my experiment by >>>using in read.maimages() this wt.fun: >>> >>>function(x, threshold=3){ >>> >>>#to exclude spots with SNR<3 on both channels >>>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); >>> >>>#to include only genes and not control spots (I use Agilent microarrays) >>>spotok <- (x[,"ControlType"] == "false"); >>> >>>#to exclude spots with flag "bad" by GenePix Pro 6 >>>flagok <- (x[,"Flags"] >= 0); >>> >>>#to exclude spot saturated >>>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); >>> >>>spot <- (snrok & spotok & flagok & satok); >>>as.numeric (spot); >>>} >>> >>>In my opinion it is right to exclude spot saturated (because its >>>intensity value is not reliable). Is it wrong? >>>I have a doubt about excluding spot with low SNR, because in my >>>last experiment I should exclude for low SNR about 60% of 45000 >>>spots and I am worried about the robustness of statistical >>>analysis evalued only on 40% of the genes. >>>Should I exclude this spot? >>>Before or after normalization? >>>Should I normalize all the spots and then, on the normalized >>>value, apply the SNR quality filter to exclude normalized spots >>>with low SNR from subsequent statistical analysis? >>>I would like to use arrayWeights() from LIMMA and combine spot >>>quality weights and array quality weights. Is it right to multiply >>>the spot weight matrix by array quality vector? >>> >>>thank you very much for any help on this complicate question. >>> >>>Erika >> >>Jenny Drnevich, Ph.D. >> >>Functional Genomics Bioinformatics Specialist >>W.M. Keck Center for Comparative and Functional Genomics >>Roy J. Carver Biotechnology Center >>University of Illinois, Urbana-Champaign >> >>330 ERML >>1201 W. Gregory Dr. >>Urbana, IL 61801 >>USA >> >>ph: 217-244-7355 >>fax: 217-265-5066 >>e-mail: drnevich at illinois.edu

ADD REPLY • link 16.1 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

Erika Melissari ▴ 250

@erika-melissari-2798

Last seen 10.6 years ago

Dear Matt, thank you for your help. My experimental design is a reference design with three classes and five array per class (a total of 15 arrays). We do not have technical replicates, but each array refers to a biological copy compared with a pool of the five wild type (we use dual-color microarrays). One class is constituted from 5 copies of Hela G1 line cells transformed with a Wild type mutation of BRCA1. The second one is constituted from 5 copies of Hela G1 cell line transformed with a pathological mutation of BRCA1 and the last one is constituted from 5 copies of Hela G1 line cells transformed with another mutation of BRCA1. We are interested in comparing the two groups of transformet cells with the two types of mutation respet to Wild Type mutation. We chose this design and this number of copies because the variance detected on a cell line is lower respert to that observed on animals (i.e. mice). Are, in you opinion, 5 arrays per class enought to use arrayWeights()? About weighting array spot-to-spot, what is your opinion? I am not sure I have to consider as good, and then usefull to fit linear model, saturated spots. In all measurement process this kind of values are discarded because are outside the range of reliability of the instrument used to permorm the measurement. Another observation... If GenePix flags as NotFound a spot because the level of hybridization is not enought to consider reliable the level of hybridization, why should I use this signal to fit my model? And concerning weighting spots before or after they have been normalized? The result change completely! I am not much persuader about using unreliable spots and a lot confused about the workflow to be followed. Could you, or anyone else, help me? Thank you so much Erika ----- Original Message ----- From: "Matt Ritchie" <mer36@cam.ac.uk> To: "Jenny Drnevich" <drnevich at="" illinois.edu="">; "Erika Melissari" <erika.melissari at="" bioclinica.unipi.it=""> Cc: <bioconductor at="" stat.math.ethz.ch=""> Sent: Thursday, February 19, 2009 05:12 AM Subject: Re: [BioC] ...another question about using weights on microarray analysis > Dear Jenny and Erika, > > Regarding the question on array weights, so long as you have enough arrays > to fit the linear model to the means, you will be able to estimate array > variance factors using arrayWeights(). The example given on the > arrayWeights help page uses the methodology on a set of 6 arrays with 3 > replicates per group. > > And yes, spot and array weights can be combined in the analysis by > multiplying them together. Make sure that what you are multiplying are > matrices of the same dimension though - the output of arrayWeights is a > vector, so you will want to run asMatrixWeights() on this vector before > multiplying with the spot weights. > > Best wishes, > > Matt > >>Hi Erika, >> >>Filtering spots on each array individually has been addressed several >>times on the list, and the general consensus is to only do it in very rare >>circumstances, such as when you have manually flagged spots that are >>scratches, dust spots, e.g., where the reported value has ABSOLUTELY NO >>RELATIONSHIP to whatever the real value might have been. Spots with low >>SNR, auto-flagged by GenePix as "missing", or saturated spots all have >>values that are approximations of what the real value is, even if they >>aren't as precise because they are outside the measurement abilities of >>the scan. As I tell my students - zeros are REAL data points - would you >>throw them out in other scientific measurements? NO. It is fine to throw >>out a spot that fails to meet your criteria on ALL arrays, like the >>control spots. >> >>I'm not sure about the array quality weights... the example uses 10 >>replicates per group, which is probably a fine number to use to determine >>which arrays aren't as much alike as the others, but I'm not sure if it's >>good to use when you only have 3 replicates. Anyone care to comment on >>this? >> >>Cheers, >>Jenny >> >>At 05:49 AM 2/17/2009, Erika Melissari wrote: >>>Hello all, >>> >>>I have found discordant opinions among Bioconductor email regarding the >>>use of quality weights on microarray analysis and I woul like to >>>understand with clarity what to do before starting the statistical >>>analysis of my last experiment. >>>I use LIMMA to perform statistical analysis of microarray experiments. >>>Usually, I assign a weight to all the spots of my experiment by using in >>>read.maimages() this wt.fun: >>> >>>function(x, threshold=3){ >>> >>>#to exclude spots with SNR<3 on both channels >>>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); >>> >>>#to include only genes and not control spots (I use Agilent microarrays) >>>spotok <- (x[,"ControlType"] == "false"); >>> >>>#to exclude spots with flag "bad" by GenePix Pro 6 >>>flagok <- (x[,"Flags"] >= 0); >>> >>>#to exclude spot saturated >>>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); >>> >>>spot <- (snrok & spotok & flagok & satok); >>>as.numeric (spot); >>>} >>> >>>In my opinion it is right to exclude spot saturated (because its >>>intensity value is not reliable). Is it wrong? >>>I have a doubt about excluding spot with low SNR, because in my last >>>experiment I should exclude for low SNR about 60% of 45000 spots and I am >>>worried about the robustness of statistical analysis evalued only on 40% >>>of the genes. >>>Should I exclude this spot? >>>Before or after normalization? >>>Should I normalize all the spots and then, on the normalized value, apply >>>the SNR quality filter to exclude normalized spots with low SNR from >>>subsequent statistical analysis? >>>I would like to use arrayWeights() from LIMMA and combine spot quality >>>weights and array quality weights. Is it right to multiply the spot >>>weight matrix by array quality vector? >>> >>>thank you very much for any help on this complicate question. >>> >>>Erika

ADD COMMENT • link 16.1 years ago Erika Melissari ▴ 250

0

Entering edit mode

Hi Erika, Let me try to make my arguments more persuasive, as I do feel strongly about this: >About weighting array spot-to-spot, what is your opinion? >I am not sure I have to consider as good, and then usefull to fit >linear model, saturated spots. In all measurement process this kind >of values are discarded because are outside the range of reliability >of the instrument used to permorm the measurement. While these values are outside the range of reliability, you did know that they were _high_. If you discard them, it's as if you had no information at all on the expression values of those samples, which is not true. You shouldn't have too many saturated spots anyway, else you didn't set the settings on your scan properly. I'll give you an example of where discarding spots could cause a big problem with the next scenario... >Another observation... >If GenePix flags as NotFound a spot because the level of >hybridization is not enought to consider reliable the level of >hybridization, why should I use this signal to fit my model? Because if there had been enough transcript, you would have measured something! The numbers, while not completely accurate, are going to be relatively low when compared to other samples that did have a measurable hybridization. Here's a real scenario: say one of your transfected lines turns on a gene that was off in the reference and in the other lines. There shouldn't be any measurable hybridization in either channel for the other lines hybed with the ref, but if you throw away all these data points, then you cannot tell that the expression was higher in the first transfected line compared with the other transfected lines! While this scenario is likely rare, these are the genes in which you would probably be most interested. Jenny >And concerning weighting spots before or after they have been >normalized? The result change completely! > >I am not much persuader about using unreliable spots and a lot >confused about the workflow to be followed. > >Could you, or anyone else, help me? > >Thank you so much > >Erika > >----- Original Message ----- From: "Matt Ritchie" <mer36 at="" cam.ac.uk=""> >To: "Jenny Drnevich" <drnevich at="" illinois.edu="">; "Erika Melissari" ><erika.melissari at="" bioclinica.unipi.it=""> >Cc: <bioconductor at="" stat.math.ethz.ch=""> >Sent: Thursday, February 19, 2009 05:12 AM >Subject: Re: [BioC] ...another question about using weights on >microarray analysis > > >>Dear Jenny and Erika, >> >>Regarding the question on array weights, so long as you have enough >>arrays to fit the linear model to the means, you will be able to >>estimate array variance factors using arrayWeights(). The example >>given on the arrayWeights help page uses the methodology on a set >>of 6 arrays with 3 replicates per group. >> >>And yes, spot and array weights can be combined in the analysis by >>multiplying them together. Make sure that what you are multiplying >>are matrices of the same dimension though - the output of >>arrayWeights is a vector, so you will want to run asMatrixWeights() >>on this vector before multiplying with the spot weights. >> >>Best wishes, >> >>Matt >> >>>Hi Erika, >>> >>>Filtering spots on each array individually has been addressed >>>several times on the list, and the general consensus is to only do >>>it in very rare circumstances, such as when you have manually >>>flagged spots that are scratches, dust spots, e.g., where the >>>reported value has ABSOLUTELY NO RELATIONSHIP to whatever the real >>>value might have been. Spots with low SNR, auto-flagged by GenePix >>>as "missing", or saturated spots all have values that are >>>approximations of what the real value is, even if they aren't as >>>precise because they are outside the measurement abilities of the >>>scan. As I tell my students - zeros are REAL data points - would >>>you throw them out in other scientific measurements? NO. It is >>>fine to throw out a spot that fails to meet your criteria on ALL >>>arrays, like the control spots. >>> >>>I'm not sure about the array quality weights... the example uses >>>10 replicates per group, which is probably a fine number to use to >>>determine which arrays aren't as much alike as the others, but I'm >>>not sure if it's good to use when you only have 3 replicates. >>>Anyone care to comment on this? >>> >>>Cheers, >>>Jenny >>> >>>At 05:49 AM 2/17/2009, Erika Melissari wrote: >>>>Hello all, >>>> >>>>I have found discordant opinions among Bioconductor email >>>>regarding the use of quality weights on microarray analysis and I >>>>woul like to understand with clarity what to do before starting >>>>the statistical analysis of my last experiment. >>>>I use LIMMA to perform statistical analysis of microarray experiments. >>>>Usually, I assign a weight to all the spots of my experiment by >>>>using in read.maimages() this wt.fun: >>>> >>>>function(x, threshold=3){ >>>> >>>>#to exclude spots with SNR<3 on both channels >>>>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); >>>> >>>>#to include only genes and not control spots (I use Agilent microarrays) >>>>spotok <- (x[,"ControlType"] == "false"); >>>> >>>>#to exclude spots with flag "bad" by GenePix Pro 6 >>>>flagok <- (x[,"Flags"] >= 0); >>>> >>>>#to exclude spot saturated >>>>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); >>>> >>>>spot <- (snrok & spotok & flagok & satok); >>>>as.numeric (spot); >>>>} >>>> >>>>In my opinion it is right to exclude spot saturated (because its >>>>intensity value is not reliable). Is it wrong? >>>>I have a doubt about excluding spot with low SNR, because in my >>>>last experiment I should exclude for low SNR about 60% of 45000 >>>>spots and I am worried about the robustness of statistical >>>>analysis evalued only on 40% of the genes. >>>>Should I exclude this spot? >>>>Before or after normalization? >>>>Should I normalize all the spots and then, on the normalized >>>>value, apply the SNR quality filter to exclude normalized spots >>>>with low SNR from subsequent statistical analysis? >>>>I would like to use arrayWeights() from LIMMA and combine spot >>>>quality weights and array quality weights. Is it right to >>>>multiply the spot weight matrix by array quality vector? >>>> >>>>thank you very much for any help on this complicate question. >>>> >>>>Erika Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at illinois.edu

ADD REPLY • link 16.1 years ago Jenny Drnevich ★ 2.0k

0

Entering edit mode

I vote with Jenny and I feel strongly too. Too much highly informative data are being discarded. The only time to discard saturated or NotFound spots is when they are in that condition on every array in the study. --Naomi At 09:56 AM 2/19/2009, Jenny Drnevich wrote: >Hi Erika, > >Let me try to make my arguments more persuasive, as I do feel >strongly about this: > >>About weighting array spot-to-spot, what is your opinion? >>I am not sure I have to consider as good, and then usefull to fit >>linear model, saturated spots. In all measurement process this kind >>of values are discarded because are outside the range of >>reliability of the instrument used to permorm the measurement. > >While these values are outside the range of reliability, you did >know that they were _high_. If you discard them, it's as if you had >no information at all on the expression values of those samples, >which is not true. You shouldn't have too many saturated spots >anyway, else you didn't set the settings on your scan properly. I'll >give you an example of where discarding spots could cause a big >problem with the next scenario... > >>Another observation... >>If GenePix flags as NotFound a spot because the level of >>hybridization is not enought to consider reliable the level of >>hybridization, why should I use this signal to fit my model? > >Because if there had been enough transcript, you would have measured >something! The numbers, while not completely accurate, are going to >be relatively low when compared to other samples that did have a >measurable hybridization. Here's a real scenario: say one of your >transfected lines turns on a gene that was off in the reference and >in the other lines. There shouldn't be any measurable hybridization >in either channel for the other lines hybed with the ref, but if you >throw away all these data points, then you cannot tell that the >expression was higher in the first transfected line compared with >the other transfected lines! While this scenario is likely rare, >these are the genes in which you would probably be most interested. > >Jenny > > >>And concerning weighting spots before or after they have been >>normalized? The result change completely! >> >>I am not much persuader about using unreliable spots and a lot >>confused about the workflow to be followed. >> >>Could you, or anyone else, help me? >> >>Thank you so much >> >>Erika >> >>----- Original Message ----- From: "Matt Ritchie" <mer36 at="" cam.ac.uk=""> >>To: "Jenny Drnevich" <drnevich at="" illinois.edu="">; "Erika Melissari" >><erika.melissari at="" bioclinica.unipi.it=""> >>Cc: <bioconductor at="" stat.math.ethz.ch=""> >>Sent: Thursday, February 19, 2009 05:12 AM >>Subject: Re: [BioC] ...another question about using weights on >>microarray analysis >> >> >>>Dear Jenny and Erika, >>> >>>Regarding the question on array weights, so long as you have >>>enough arrays to fit the linear model to the means, you will be >>>able to estimate array variance factors using arrayWeights(). The >>>example given on the arrayWeights help page uses the methodology >>>on a set of 6 arrays with 3 replicates per group. >>> >>>And yes, spot and array weights can be combined in the analysis by >>>multiplying them together. Make sure that what you are multiplying >>>are matrices of the same dimension though - the output of >>>arrayWeights is a vector, so you will want to run >>>asMatrixWeights() on this vector before multiplying with the spot weights. >>> >>>Best wishes, >>> >>>Matt >>> >>>>Hi Erika, >>>> >>>>Filtering spots on each array individually has been addressed >>>>several times on the list, and the general consensus is to only >>>>do it in very rare circumstances, such as when you have manually >>>>flagged spots that are scratches, dust spots, e.g., where the >>>>reported value has ABSOLUTELY NO RELATIONSHIP to whatever the >>>>real value might have been. Spots with low SNR, auto-flagged by >>>>GenePix as "missing", or saturated spots all have values that are >>>>approximations of what the real value is, even if they aren't as >>>>precise because they are outside the measurement abilities of the >>>>scan. As I tell my students - zeros are REAL data points - would >>>>you throw them out in other scientific measurements? NO. It is >>>>fine to throw out a spot that fails to meet your criteria on ALL >>>>arrays, like the control spots. >>>> >>>>I'm not sure about the array quality weights... the example uses >>>>10 replicates per group, which is probably a fine number to use >>>>to determine which arrays aren't as much alike as the others, but >>>>I'm not sure if it's good to use when you only have 3 replicates. >>>>Anyone care to comment on this? >>>> >>>>Cheers, >>>>Jenny >>>> >>>>At 05:49 AM 2/17/2009, Erika Melissari wrote: >>>>>Hello all, >>>>> >>>>>I have found discordant opinions among Bioconductor email >>>>>regarding the use of quality weights on microarray analysis and >>>>>I woul like to understand with clarity what to do before >>>>>starting the statistical analysis of my last experiment. >>>>>I use LIMMA to perform statistical analysis of microarray experiments. >>>>>Usually, I assign a weight to all the spots of my experiment by >>>>>using in read.maimages() this wt.fun: >>>>> >>>>>function(x, threshold=3){ >>>>> >>>>>#to exclude spots with SNR<3 on both channels >>>>>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold ); >>>>> >>>>>#to include only genes and not control spots (I use Agilent microarrays) >>>>>spotok <- (x[,"ControlType"] == "false"); >>>>> >>>>>#to exclude spots with flag "bad" by GenePix Pro 6 >>>>>flagok <- (x[,"Flags"] >= 0); >>>>> >>>>>#to exclude spot saturated >>>>>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 ); >>>>> >>>>>spot <- (snrok & spotok & flagok & satok); >>>>>as.numeric (spot); >>>>>} >>>>> >>>>>In my opinion it is right to exclude spot saturated (because its >>>>>intensity value is not reliable). Is it wrong? >>>>>I have a doubt about excluding spot with low SNR, because in my >>>>>last experiment I should exclude for low SNR about 60% of 45000 >>>>>spots and I am worried about the robustness of statistical >>>>>analysis evalued only on 40% of the genes. >>>>>Should I exclude this spot? >>>>>Before or after normalization? >>>>>Should I normalize all the spots and then, on the normalized >>>>>value, apply the SNR quality filter to exclude normalized spots >>>>>with low SNR from subsequent statistical analysis? >>>>>I would like to use arrayWeights() from LIMMA and combine spot >>>>>quality weights and array quality weights. Is it right to >>>>>multiply the spot weight matrix by array quality vector? >>>>> >>>>>thank you very much for any help on this complicate question. >>>>> >>>>>Erika > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: drnevich at illinois.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 16.1 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi folks, is there a way to get the exact genome position of a probeset on a affymetrix exon microarray chip. at the moment I do all analysis with the package "exonmap". i know that it is possible to get the information wheater a probeset is intronic, intergenic or exonic, but that is not enough for me:) thanks in advance, paul > sessionInfo() R version 2.8.0 (2008-10-20) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] splines tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] exonmap_2.1.03 RMySQL_0.7-2 RColorBrewer_1.0-2 [4] genefilter_1.22.0 survival_2.34-1 affy_1.18.1 [7] preprocessCore_1.4.0 affyio_1.8.0 topGO_1.10.1 [10] SparseM_0.79 GO.db_2.2.0 AnnotationDbi_1.2.1 [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.0.1 [16] graph_1.18.1 loaded via a namespace (and not attached): [1] annotate_1.18.0 cluster_1.11.10

ADD REPLY • link 16.1 years ago Paul Hammer ▴ 220

0

Entering edit mode

Hi, probeset.to.probe() will give you what you're after. Crispin On 19/02/2009 12:26, "Paul Hammer" <paul.hammer at="" p-t-p.de=""> wrote: > Hi folks, > > is there a way to get the exact genome position of a probeset on a > affymetrix exon microarray chip. at the moment I do all analysis with > the package "exonmap". i know that it is possible to get the information > wheater a probeset is intronic, intergenic or exonic, but that is not > enough for me:) > > thanks in advance, > paul > >> sessionInfo() > R version 2.8.0 (2008-10-20) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;L > C_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C; LC_ADDRESS > =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] exonmap_2.1.03 RMySQL_0.7-2 RColorBrewer_1.0-2 > [4] genefilter_1.22.0 survival_2.34-1 affy_1.18.1 > [7] preprocessCore_1.4.0 affyio_1.8.0 topGO_1.10.1 > [10] SparseM_0.79 GO.db_2.2.0 AnnotationDbi_1.2.1 > [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.0.1 > [16] graph_1.18.1 > > loaded via a namespace (and not attached): > [1] annotate_1.18.0 cluster_1.11.10 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -------------------------------------------------------- This email is confidential and intended solely for the u...{{dropped:12}}

ADD REPLY • link 16.1 years ago Crispin Miller ★ 1.1k

0

Entering edit mode

Hi Crispin, thanks for the quick answer. I tried the function and it worked. e.g. probeset.to.probe("2390563") [1] 11794458 11794456 11794459 11794457 the only point is that no chromosome information is provided. maybe for the next release :) thanks, paul Crispin Miller schrieb: > Hi, > probeset.to.probe() will give you what you're after. > Crispin > > > > On 19/02/2009 12:26, "Paul Hammer" <paul.hammer@p-t-p.de> wrote: > > >> Hi folks, >> >> is there a way to get the exact genome position of a probeset on a >> affymetrix exon microarray chip. at the moment I do all analysis with >> the package "exonmap". i know that it is possible to get the information >> wheater a probeset is intronic, intergenic or exonic, but that is not >> enough for me:) >> >> thanks in advance, >> paul >> >> >>> sessionInfo() >>> >> R version 2.8.0 (2008-10-20) >> x86_64-unknown-linux-gnu >> >> locale: >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en _US.UTF-8;L >> C_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C ;LC_ADDRESS >> =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C >> >> attached base packages: >> [1] splines tools stats graphics grDevices utils datasets >> [8] methods base >> >> other attached packages: >> [1] exonmap_2.1.03 RMySQL_0.7-2 RColorBrewer_1.0-2 >> [4] genefilter_1.22.0 survival_2.34-1 affy_1.18.1 >> [7] preprocessCore_1.4.0 affyio_1.8.0 topGO_1.10.1 >> [10] SparseM_0.79 GO.db_2.2.0 AnnotationDbi_1.2.1 >> [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.0.1 >> [16] graph_1.18.1 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.18.0 cluster_1.11.10 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -------------------------------------------------------- > This email is confidential and intended solely for the...{{dropped:7}}

ADD REPLY • link 16.1 years ago Paul Hammer ▴ 220

0

Entering edit mode

Hi, You¹ll get a table of data back if you specify Oas.vector=FALSE¹ in the call -- e.g. > probeset.to.probe("2390563",as.vector=FALSE) Crispin On 19/02/2009 13:15, "Paul Hammer" <paul.hammer@p-t-p.de> wrote: > Hi Crispin, > > thanks for the quick answer. I tried the function and it worked. > > e.g. > probeset.to.probe("2390563") > [1] 11794458 11794456 11794459 11794457 > > the only point is that no chromosome information is provided. maybe for the > next release :) > > thanks, > paul > > Crispin Miller schrieb: >> >> Hi, >> probeset.to.probe() will give you what you're after. >> Crispin >> >> >> >> On 19/02/2009 12:26, "Paul Hammer" <paul.hammer@p-t-p.de> >> <mailto:paul.hammer@p-t-p.de> wrote: >> >> >> >>> >>> Hi folks, >>> >>> is there a way to get the exact genome position of a probeset on a >>> affymetrix exon microarray chip. at the moment I do all analysis with >>> the package "exonmap". i know that it is possible to get the information >>> wheater a probeset is intronic, intergenic or exonic, but that is not >>> enough for me:) >>> >>> thanks in advance, >>> paul >>> >>> >>> >>>> >>>> sessionInfo() >>>> >>>> >>> >>> R version 2.8.0 (2008-10-20) >>> x86_64-unknown-linux-gnu >>> >>> locale: >>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=e n_US.UTF-8 >>> ;L >>> C_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME= C;LC_ADDRE >>> SS >>> =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] splines tools stats graphics grDevices utils datasets >>> [8] methods base >>> >>> other attached packages: >>> [1] exonmap_2.1.03 RMySQL_0.7-2 RColorBrewer_1.0-2 >>> [4] genefilter_1.22.0 survival_2.34-1 affy_1.18.1 >>> [7] preprocessCore_1.4.0 affyio_1.8.0 topGO_1.10.1 >>> [10] SparseM_0.79 GO.db_2.2.0 AnnotationDbi_1.2.1 >>> [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.0.1 >>> [16] graph_1.18.1 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.18.0 cluster_1.11.10 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> -------------------------------------------------------- >> This email is confidential and intended solely for the use of the person(s) >> ('the intended recipient') to whom it was addressed. Any views or opinions >> presented are solely those of the author and do not necessarily represent >> those of the Paterson Institute for Cancer Research or the University of >> Manchester. It may contain information that is privileged & confidential >> within the meaning of applicable law. Accordingly any dissemination, >> distribution, copying, or other use of this message, or any of its contents, >> by any person other than the intended recipient may constitute a breach of >> civil or criminal law and is strictly prohibited. If you are NOT the intended >> recipient please contact the sender and dispose of this e-mail as soon as >> possible. >> > > -------------------------------------------------------- This email is confidential and intended solely for the u...{{dropped:15}}

ADD REPLY • link 16.1 years ago Crispin Miller ★ 1.1k

0

Entering edit mode

Great Crispin :) here the example: > probeset.to.probe("2390563",as.vector=FALSE) probeset_name idx hit_id seq_region_id chr seq_region_start seq_region_end 1 2390563 1 11794458 226034 1 5761 5785 2 2390563 2 11794456 226034 1 5755 5779 3 2390563 3 11794459 226034 1 5765 5789 4 2390563 4 11794457 226034 1 5757 5781 seq_region_strand 1 -1 2 -1 3 -1 4 -1 Thanks!!! Paul Crispin Miller schrieb: > Hi, > > You¹ll get a table of data back if you specify Oas.vector=FALSE¹ in the > call -- e.g. > > > probeset.to.probe("2390563",as.vector=FALSE) > > Crispin > > > On 19/02/2009 13:15, "Paul Hammer" <paul.hammer@p-t-p.de> wrote: > > >> Hi Crispin, >> >> thanks for the quick answer. I tried the function and it worked. >> >> e.g. >> probeset.to.probe("2390563") >> [1] 11794458 11794456 11794459 11794457 >> >> the only point is that no chromosome information is provided. maybe for the >> next release :) >> >> thanks, >> paul >> >> Crispin Miller schrieb: >> >>> >>> Hi, >>> probeset.to.probe() will give you what you're after. >>> Crispin >>> >>> >>> >>> On 19/02/2009 12:26, "Paul Hammer" <paul.hammer@p-t-p.de> >>> <mailto:paul.hammer@p-t-p.de> wrote: >>> >>> >>> >>> >>>> >>>> Hi folks, >>>> >>>> is there a way to get the exact genome position of a probeset on a >>>> affymetrix exon microarray chip. at the moment I do all analysis with >>>> the package "exonmap". i know that it is possible to get the information >>>> wheater a probeset is intronic, intergenic or exonic, but that is not >>>> enough for me:) >>>> >>>> thanks in advance, >>>> paul >>>> >>>> >>>> >>>> >>>>> >>>>> sessionInfo() >>>>> >>>>> >>>>> >>>> >>>> R version 2.8.0 (2008-10-20) >>>> x86_64-unknown-linux-gnu >>>> >>>> locale: >>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE= en_US.UTF-8 >>>> ;L >>>> C_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME =C;LC_ADDRE >>>> SS >>>> =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] splines tools stats graphics grDevices utils datasets >>>> [8] methods base >>>> >>>> other attached packages: >>>> [1] exonmap_2.1.03 RMySQL_0.7-2 RColorBrewer_1.0-2 >>>> [4] genefilter_1.22.0 survival_2.34-1 affy_1.18.1 >>>> [7] preprocessCore_1.4.0 affyio_1.8.0 topGO_1.10.1 >>>> [10] SparseM_0.79 GO.db_2.2.0 AnnotationDbi_1.2.1 >>>> [13] RSQLite_0.7-0 DBI_0.2-4 Biobase_2.0.1 >>>> [16] graph_1.18.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] annotate_1.18.0 cluster_1.11.10 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >>> >>> -------------------------------------------------------- >>> This email is confidential and intended solely for the use of the person(s) >>> ('the intended recipient') to whom it was addressed. Any views or opinions >>> presented are solely those of the author and do not necessarily represent >>> those of the Paterson Institute for Cancer Research or the University of >>> Manchester. It may contain information that is privileged & confidential >>> within the meaning of applicable law. Accordingly any dissemination, >>> distribution, copying, or other use of this message, or any of its contents, >>> by any person other than the intended recipient may constitute a breach of >>> civil or criminal law and is strictly prohibited. If you are NOT the intended >>> recipient please contact the sender and dispose of this e-mail as soon as >>> possible. >>> >>> >> > -------------------------------------------------------- > This email is confidential and intended solely for the...{{dropped:15}}

ADD REPLY • link 16.1 years ago Paul Hammer ▴ 220

Login before adding your answer.