help: limma and changing gene results!

0

Entering edit mode

Koen Marien ▴ 50

@koen-marien-3971

Last seen 10.2 years ago

Dear Is this also the reason why there is a difference in the (differentially expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? a-(b+c+d): putting the b, c and d values in one group (b+c+d) and using limma venny(a-b,a-c,a-d): using limma on the separate groups and create a list by looking at the intersection of de venn diagram of the three 'sublists' a-b, a-c, a-d Thanks a lot Koen Marien student bioscience engineering: cell and gene biotechnology University of Ghent, Belgium -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. MacDonald Sent: donderdag 29 april 2010 18:46 To: Joseph Skaf Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help: limma and changing gene results! Hi Joseph, Joseph Skaf wrote: > To whom it may concern, > > I've been having some problems with consistency in my limma results for > genes that are found to have significant differential transcript abundance. > > In a given example, I may have 4 different groups (a, b, c, and d) in an > array set of 12. > > From here, I make a contrast matrix that has contrasts for a-b, a-c, and > a-d. Eventually, I output an eBaye's corrected contrast fit and I use > decideTests from there to find out what genes are differentially expressed. > My misunderstanding is that when I take away an entire group (such as > removing all d's) and redo all steps in the limma analysis, I find that I > end up with a different set of genes after using decideTests. I am confused > here, because I would not think that removing group 'd' from the analysis > would have an effect on contrasts a-b and a-c. > > If anyone could even hint to me a reason as to why this is happening, it > would be greatly appreciated. It's because of how the denominator for your contrast is computed. The denominator is computed using the intra-group variance for all the groups in your study, not just the two groups being compared in the contrast. So if you remove one of the groups, you lose both degrees of freedom as well as the contribution from the intra-group variance of that group. Losing the degrees of freedom will reduce your power to detect differences. Losing the contribution of the intra-group variance will depend on how variable the group d data are compared to groups a-c. Best, Jim > > Thanks and regards, > Joseph Skaf > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

limma limma • 1.1k views

ADD COMMENT • link updated 14.6 years ago by James W. MacDonald 67k • written 14.6 years ago by Koen Marien ▴ 50

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 4 days ago

United States

Hi Koen, Koen Marien wrote: > Dear > > > Is this also the reason why there is a difference in the (differentially > expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? I am not familiar with the venny() function, so it's hard to say. But if you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, a-c, a-d, then no. In the first place, a-(b+c+d) isn't a contrast, and in most cases doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and tests the difference between the a group and the mean of the other three. The denominator will be the same in each case, being based on (in simple terms) the average variability of the four groups. However, if what I am assuming is correct, then the two contrasts are quite different, and shouldn't be expected to result in the same gene lists. As an example, say the mean of the groups for one gene are: a = 5 b = 2 c = 5 d = 8 since the denominator will be the same we can ignore that here. So do you think there will be a difference in what is called significant when we compare 5 - (2+5+8)/3 = 0 and 5 - 2 = 3 5 - 5 = 0 5 - 8 = -3 ? Best, Jim > > a-(b+c+d): putting the b, c and d values in one > group (b+c+d) and using limma > venny(a-b,a-c,a-d): using limma on the separate groups and > create a list by looking at the intersection of de venn diagram of the three > > 'sublists' a-b, a-c, a-d > > > Thanks a lot > > > Koen Marien > student bioscience engineering: cell and gene biotechnology > University of Ghent, Belgium > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. > MacDonald > Sent: donderdag 29 april 2010 18:46 > To: Joseph Skaf > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] help: limma and changing gene results! > > Hi Joseph, > > Joseph Skaf wrote: >> To whom it may concern, >> >> I've been having some problems with consistency in my limma results for >> genes that are found to have significant differential transcript > abundance. >> In a given example, I may have 4 different groups (a, b, c, and d) in an >> array set of 12. >> >> From here, I make a contrast matrix that has contrasts for a-b, a-c, and >> a-d. Eventually, I output an eBaye's corrected contrast fit and I use >> decideTests from there to find out what genes are differentially > expressed. >> My misunderstanding is that when I take away an entire group (such as >> removing all d's) and redo all steps in the limma analysis, I find that I >> end up with a different set of genes after using decideTests. I am > confused >> here, because I would not think that removing group 'd' from the analysis >> would have an effect on contrasts a-b and a-c. >> >> If anyone could even hint to me a reason as to why this is happening, it >> would be greatly appreciated. > > It's because of how the denominator for your contrast is computed. The > denominator is computed using the intra-group variance for all the > groups in your study, not just the two groups being compared in the > contrast. > > So if you remove one of the groups, you lose both degrees of freedom as > well as the contribution from the intra-group variance of that group. > Losing the degrees of freedom will reduce your power to detect > differences. Losing the contribution of the intra-group variance will > depend on how variable the group d data are compared to groups a-c. > > Best, > > Jim > > > >> Thanks and regards, >> Joseph Skaf >> >> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.6 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks for the clear and fast reply, Jim. Indeed a-(b+c+d) isn't a contrast, but I think I'm having a different problem. Here is the experiment shortly explained: I have four populations of cells with three biological replicates for each population -> a1,a2,a3,b1,b2,b3,c1,c2,c3,d1,d2,d3. I normalized them and looked at the differentially expressed genes between the 'a' population and each of those other populations individually: a-b, a-c, a-d. The venn approach is done with the online web application Venny and only looks at the common probe set ID's in the three lists (let's call it the 'one-on- one strategy'). I also looked at the differentially expressed genes when b, c and d values where put together: a-e with e=b+c+d (let's call it the 'group strategy'). So it's not really the contrasts that are changed. Now, when looking at the one-on-one strategy list there are only five genes common in the three groups with a B-value > 2, while in the group strategy there are 181 probe sets with a B-value > 2. Relevent code used: read all the cell files (a,b,c,d) pd<-data.frame(population=c(rep(1,3),rep(2,8)),replicate=c(seq(1,3),se q(1,8) )) => group strategy or only read the .cel files of two populations (a,b or a,c or a,d) pd<-data.frame(population=c(rep(1,3),rep(2,3)),replicate=c(seq(1,3),se q(1,3) )) => one-on-one strategy (repeated three times for each comparison) group<-factor(eset$population) design = model.matrix(~0+group) design cont.matrix = makeContrasts(eset = (group2 - group1), levels = design) cont.matrix Regards Koen -----Original Message----- From: James MacDonald [mailto:jmacdon@med.umich.edu] Sent: woensdag 12 mei 2010 4:40 To: Koen Marien Cc: 'Joseph Skaf'; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help: limma and changing gene results! Hi Koen, Koen Marien wrote: > Dear > > > Is this also the reason why there is a difference in the (differentially > expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? I am not familiar with the venny() function, so it's hard to say. But if you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, a-c, a-d, then no. In the first place, a-(b+c+d) isn't a contrast, and in most cases doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and tests the difference between the a group and the mean of the other three. The denominator will be the same in each case, being based on (in simple terms) the average variability of the four groups. However, if what I am assuming is correct, then the two contrasts are quite different, and shouldn't be expected to result in the same gene lists. As an example, say the mean of the groups for one gene are: a = 5 b = 2 c = 5 d = 8 since the denominator will be the same we can ignore that here. So do you think there will be a difference in what is called significant when we compare 5 - (2+5+8)/3 = 0 and 5 - 2 = 3 5 - 5 = 0 5 - 8 = -3 ? Best, Jim > > a-(b+c+d): putting the b, c and d values in one > group (b+c+d) and using limma > venny(a-b,a-c,a-d): using limma on the separate groups and > create a list by looking at the intersection of de venn diagram of the three > > 'sublists' a-b, a-c, a-d > > > Thanks a lot > > > Koen Marien > student bioscience engineering: cell and gene biotechnology > University of Ghent, Belgium > > > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. > MacDonald > Sent: donderdag 29 april 2010 18:46 > To: Joseph Skaf > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] help: limma and changing gene results! > > Hi Joseph, > > Joseph Skaf wrote: >> To whom it may concern, >> >> I've been having some problems with consistency in my limma results for >> genes that are found to have significant differential transcript > abundance. >> In a given example, I may have 4 different groups (a, b, c, and d) in an >> array set of 12. >> >> From here, I make a contrast matrix that has contrasts for a-b, a-c, and >> a-d. Eventually, I output an eBaye's corrected contrast fit and I use >> decideTests from there to find out what genes are differentially > expressed. >> My misunderstanding is that when I take away an entire group (such as >> removing all d's) and redo all steps in the limma analysis, I find that I >> end up with a different set of genes after using decideTests. I am > confused >> here, because I would not think that removing group 'd' from the analysis >> would have an effect on contrasts a-b and a-c. >> >> If anyone could even hint to me a reason as to why this is happening, it >> would be greatly appreciated. > > It's because of how the denominator for your contrast is computed. The > denominator is computed using the intra-group variance for all the > groups in your study, not just the two groups being compared in the > contrast. > > So if you remove one of the groups, you lose both degrees of freedom as > well as the contribution from the intra-group variance of that group. > Losing the degrees of freedom will reduce your power to detect > differences. Losing the contribution of the intra-group variance will > depend on how variable the group d data are compared to groups a-c. > > Best, > > Jim > > > >> Thanks and regards, >> Joseph Skaf >> >> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.6 years ago Koen Marien ▴ 50

0

Entering edit mode

Koen Marien wrote: > Thanks for the clear and fast reply, Jim. Indeed a-(b+c+d) isn't a contrast, > but I think I'm having a different problem. Here is the experiment shortly > explained: Yes, but... a-(b+c+d) doesn't make any sense. Why would you do such a thing? Let's say the mean of all four samples for a given gene is identical (I dunno, say 5). Any of a-b, a-c, a-d will be zero, whereas a-(b+c+d) is -10. So what does that tell us, in a biological sense? > > I have four populations of cells with three biological replicates for each > population -> a1,a2,a3,b1,b2,b3,c1,c2,c3,d1,d2,d3. I normalized them and > looked at the differentially expressed genes between the 'a' population and > each of those other populations individually: a-b, a-c, a-d. The venn > approach is done with the online web application Venny and only looks at the > common probe set ID's in the three lists (let's call it the 'one-on- one > strategy'). > I also looked at the differentially expressed genes when b, c and d values > where put together: a-e with e=b+c+d (let's call it the 'group strategy'). > So it's not really the contrasts that are changed. How are the contrasts not changed? You are comparing a contrast with a not-a-contrast that doesn't even make sense. That there will be differences is a forgone conclusion. Best, Jim > > Now, when looking at the one-on-one strategy list there are only five genes > common in the three groups with a B-value > 2, while in the group strategy > there are 181 probe sets with a B-value > 2. > > Relevent code used: > read all the cell files (a,b,c,d) > pd<-data.frame(population=c(rep(1,3),rep(2,8)),replicate=c(seq(1,3), seq(1,8) > )) => group strategy > or > only read the .cel files of two populations (a,b or a,c or a,d) > pd<-data.frame(population=c(rep(1,3),rep(2,3)),replicate=c(seq(1,3), seq(1,3) > )) => one-on-one strategy (repeated three times for each comparison) > > group<-factor(eset$population) > design = model.matrix(~0+group) > design > cont.matrix = makeContrasts(eset = (group2 - group1), levels = design) > cont.matrix > > > Regards > > Koen > > -----Original Message----- > From: James MacDonald [mailto:jmacdon at med.umich.edu] > Sent: woensdag 12 mei 2010 4:40 > To: Koen Marien > Cc: 'Joseph Skaf'; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] help: limma and changing gene results! > > Hi Koen, > > Koen Marien wrote: >> Dear >> >> >> Is this also the reason why there is a difference in the (differentially >> expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? > > I am not familiar with the venny() function, so it's hard to say. But if > you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, > a-c, a-d, then no. > > In the first place, a-(b+c+d) isn't a contrast, and in most cases > doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and > tests the difference between the a group and the mean of the other > three. The denominator will be the same in each case, being based on (in > simple terms) the average variability of the four groups. > > However, if what I am assuming is correct, then the two contrasts are > quite different, and shouldn't be expected to result in the same gene > lists. As an example, say the mean of the groups for one gene are: > > a = 5 > b = 2 > c = 5 > d = 8 > > since the denominator will be the same we can ignore that here. So do > you think there will be a difference in what is called significant when > we compare > > 5 - (2+5+8)/3 = 0 > > and > > 5 - 2 = 3 > 5 - 5 = 0 > 5 - 8 = -3 > > ? > > Best, > > Jim > > >> a-(b+c+d): putting the b, c and d values in one >> group (b+c+d) and using limma >> venny(a-b,a-c,a-d): using limma on the separate groups and >> create a list by looking at the intersection of de venn diagram of the > three >> 'sublists' a-b, a-c, a-d >> >> >> Thanks a lot >> >> >> Koen Marien >> student bioscience engineering: cell and gene biotechnology >> University of Ghent, Belgium >> >> >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. >> MacDonald >> Sent: donderdag 29 april 2010 18:46 >> To: Joseph Skaf >> Cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] help: limma and changing gene results! >> >> Hi Joseph, >> >> Joseph Skaf wrote: >>> To whom it may concern, >>> >>> I've been having some problems with consistency in my limma results for >>> genes that are found to have significant differential transcript >> abundance. >>> In a given example, I may have 4 different groups (a, b, c, and d) in an >>> array set of 12. >>> >>> From here, I make a contrast matrix that has contrasts for a-b, a-c, and >>> a-d. Eventually, I output an eBaye's corrected contrast fit and I use >>> decideTests from there to find out what genes are differentially >> expressed. >>> My misunderstanding is that when I take away an entire group (such as >>> removing all d's) and redo all steps in the limma analysis, I find that I >>> end up with a different set of genes after using decideTests. I am >> confused >>> here, because I would not think that removing group 'd' from the analysis >>> would have an effect on contrasts a-b and a-c. >>> >>> If anyone could even hint to me a reason as to why this is happening, it >>> would be greatly appreciated. >> It's because of how the denominator for your contrast is computed. The >> denominator is computed using the intra-group variance for all the >> groups in your study, not just the two groups being compared in the >> contrast. >> >> So if you remove one of the groups, you lose both degrees of freedom as >> well as the contribution from the intra-group variance of that group. >> Losing the degrees of freedom will reduce your power to detect >> differences. Losing the contribution of the intra-group variance will >> depend on how variable the group d data are compared to groups a-c. >> >> Best, >> >> Jim >> >> >> >>> Thanks and regards, >>> Joseph Skaf >>> >>> >>> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.6 years ago James W. MacDonald 67k

0

Entering edit mode

>>>> Dear Jim and others who can help Koen Marien wrote: > Thanks for the clear and fast reply, Jim. Indeed a-(b+c+d) isn't a contrast, > but I think I'm having a different problem. Here is the experiment shortly > explained: Yes, but... a-(b+c+d) doesn't make any sense. Why would you do such a thing? Let's say the mean of all four samples for a given gene is identical (I dunno, say 5). Any of a-b, a-c, a-d will be zero, whereas a-(b+c+d) is -10. So what does that tell us, in a biological sense? >>>> I compare a progenitor population with three offspring populations to identify surface markers. So I need upregulated genes in the 'a' population >>>> compared to 'b', 'c' and 'd' populations > > I have four populations of cells with three biological replicates for each > population -> a1,a2,a3,b1,b2,b3,c1,c2,c3,d1,d2,d3. I normalized them and > looked at the differentially expressed genes between the 'a' population and > each of those other populations individually: a-b, a-c, a-d. The venn > approach is done with the online web application Venny and only looks at the > common probe set ID's in the three lists (let's call it the 'one-on- one > strategy'). > I also looked at the differentially expressed genes when b, c and d values > where put together: a-e with e=b+c+d (let's call it the 'group strategy'). > So it's not really the contrasts that are changed. How are the contrasts not changed? You are comparing a contrast with a not-a-contrast that doesn't even make sense. That there will be differences is a forgone conclusion. >>>> I don't really change the contrast (look at the code, it's always 'group2-group1') >>>> I'll try to explain again: >>>> one-on-one strategy: compared a to b, a to c, a to d and compared the differentially expressed genes with the online Venny-tool (http://bioinfogp.cnb.csic.es/tools/venny/index.html). So e.g. group1 = 'a' population (always) and group 2 = 'b' or 'c' or 'd' (I ran the code three >>>> times) >>>> group strategy: compared a to (b&c&d) (look at the code: I annotated the 'a' files by appointing them to population '1' and the 'b','c','d' files by >>>> appointing >> them to population '2') so group1 = 'a' population and group2 = 'b'+'c'+'d' >>>> My questions are: Why do I get different lists in these two approaches? Which approach gives me the best results when I look for specifically >>>> upregulated genes in the 'a' population? >>>> I'm still learning and especially learn a lot from you, so thanks for your patience, Koen Best, Jim > > Now, when looking at the one-on-one strategy list there are only five genes > common in the three groups with a B-value > 2, while in the group strategy > there are 181 probe sets with a B-value > 2. > > Relevent code used: > read all the cell files (a,b,c,d) > pd<-data.frame(population=c(rep(1,3),rep(2,8)),replicate=c(seq(1,3),se q(1,8) > )) => group strategy > or > only read the .cel files of two populations (a,b or a,c or a,d) > pd<-data.frame(population=c(rep(1,3),rep(2,3)),replicate=c(seq(1,3),se q(1,3) > )) => one-on-one strategy (repeated three times for each comparison) > > group<-factor(eset$population) > design = model.matrix(~0+group) > design > cont.matrix = makeContrasts(eset = (group2 - group1), levels = design) > cont.matrix > > > Regards > > Koen > > -----Original Message----- > From: James MacDonald [mailto:jmacdon at med.umich.edu] > Sent: woensdag 12 mei 2010 4:40 > To: Koen Marien > Cc: 'Joseph Skaf'; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] help: limma and changing gene results! > > Hi Koen, > > Koen Marien wrote: >> Dear >> >> >> Is this also the reason why there is a difference in the (differentially >> expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? > > I am not familiar with the venny() function, so it's hard to say. But if > you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, > a-c, a-d, then no. > > In the first place, a-(b+c+d) isn't a contrast, and in most cases > doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and > tests the difference between the a group and the mean of the other > three. The denominator will be the same in each case, being based on (in > simple terms) the average variability of the four groups. > > However, if what I am assuming is correct, then the two contrasts are > quite different, and shouldn't be expected to result in the same gene > lists. As an example, say the mean of the groups for one gene are: > > a = 5 > b = 2 > c = 5 > d = 8 > > since the denominator will be the same we can ignore that here. So do > you think there will be a difference in what is called significant when > we compare > > 5 - (2+5+8)/3 = 0 > > and > > 5 - 2 = 3 > 5 - 5 = 0 > 5 - 8 = -3 > > ? > > Best, > > Jim > > >> a-(b+c+d): putting the b, c and d values in one >> group (b+c+d) and using limma >> venny(a-b,a-c,a-d): using limma on the separate groups and >> create a list by looking at the intersection of de venn diagram of the > three >> 'sublists' a-b, a-c, a-d >> >> >> Thanks a lot >> >> >> Koen Marien >> student bioscience engineering: cell and gene biotechnology >> University of Ghent, Belgium >> >> >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. >> MacDonald >> Sent: donderdag 29 april 2010 18:46 >> To: Joseph Skaf >> Cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] help: limma and changing gene results! >> >> Hi Joseph, >> >> Joseph Skaf wrote: >>> To whom it may concern, >>> >>> I've been having some problems with consistency in my limma results for >>> genes that are found to have significant differential transcript >> abundance. >>> In a given example, I may have 4 different groups (a, b, c, and d) in an >>> array set of 12. >>> >>> From here, I make a contrast matrix that has contrasts for a-b, a-c, and >>> a-d. Eventually, I output an eBaye's corrected contrast fit and I use >>> decideTests from there to find out what genes are differentially >> expressed. >>> My misunderstanding is that when I take away an entire group (such as >>> removing all d's) and redo all steps in the limma analysis, I find that I >>> end up with a different set of genes after using decideTests. I am >> confused >>> here, because I would not think that removing group 'd' from the analysis >>> would have an effect on contrasts a-b and a-c. >>> >>> If anyone could even hint to me a reason as to why this is happening, it >>> would be greatly appreciated. >> It's because of how the denominator for your contrast is computed. The >> denominator is computed using the intra-group variance for all the >> groups in your study, not just the two groups being compared in the >> contrast. >> >> So if you remove one of the groups, you lose both degrees of freedom as >> well as the contribution from the intra-group variance of that group. >> Losing the degrees of freedom will reduce your power to detect >> differences. Losing the contribution of the intra-group variance will >> depend on how variable the group d data are compared to groups a-c. >> >> Best, >> >> Jim >> >> >> >>> Thanks and regards, >>> Joseph Skaf >>> >>> >>> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.5 years ago Koen Marien ▴ 50

0

Entering edit mode

Hi Koen, Koen Marien wrote: >>>>> Dear Jim and others who can help If possible, it is preferable for your response to _not_ be preceded by >>>>>, as that looks to most like what was written five responses ago, rather than being the current portion of the email. > > Koen Marien wrote: >> Thanks for the clear and fast reply, Jim. Indeed a-(b+c+d) isn't a > contrast, >> but I think I'm having a different problem. Here is the experiment shortly >> explained: > > Yes, but... a-(b+c+d) doesn't make any sense. Why would you do such a > thing? Let's say the mean of all four samples for a given gene is > identical (I dunno, say 5). > Any of a-b, a-c, a-d will be zero, whereas a-(b+c+d) is -10. So what > does that tell us, in a biological sense? > >>>>> I compare a progenitor population with three offspring populations to > identify surface markers. So I need upregulated genes in the 'a' population >>>>> compared to 'b', 'c' and 'd' populations OK, fine. But you seem to be missing the fact that we are just doing simple math here. If you want to compare the 'a' population to b-d, then the only reasonable way to do that is to use the mean of the b-d populations. That is why I say you aren't doing a contrast. For a comparison to be a contrast, the coefficients have to sum to zero, so what you want is a - (b + c + d)/3. > > >> >> I have four populations of cells with three biological replicates for each >> population -> a1,a2,a3,b1,b2,b3,c1,c2,c3,d1,d2,d3. I normalized them and >> looked at the differentially expressed genes between the 'a' population > and >> each of those other populations individually: a-b, a-c, a-d. The venn >> approach is done with the online web application Venny and only looks at > the >> common probe set ID's in the three lists (let's call it the 'one- on-one >> strategy'). >> I also looked at the differentially expressed genes when b, c and d values >> where put together: a-e with e=b+c+d (let's call it the 'group strategy'). >> So it's not really the contrasts that are changed. > > How are the contrasts not changed? You are comparing a contrast with a > not-a-contrast that doesn't even make sense. That there will be > differences is a forgone conclusion. > >>>>> I don't really change the contrast (look at the code, it's always > 'group2-group1') >>>>> I'll try to explain again: >>>>> one-on-one strategy: compared a to b, a to c, a to d and compared the > differentially expressed genes with the online Venny-tool > (http://bioinfogp.cnb.csic.es/tools/venny/index.html). So e.g. group1 = 'a' > population (always) and group 2 = 'b' or 'c' or 'd' (I ran the code three >>>>> times) > >>>>> group strategy: compared a to (b&c&d) (look at the code: I annotated > the 'a' files by appointing them to population '1' and the 'b','c','d' files > by >>>>> appointing >> them to population '2') so group1 = 'a' population and > group2 = 'b'+'c'+'d' Right. And that is what doesn't make sense. You can set group 2 to be (b+c+d)/3, and then compare that to a. This is similar to individually comparing b, c, and d to a, except you are 'smoothing' the values for the offspring samples by taking the mean, so you will likely still get differences, depending on the underlying data. > >>>>> My questions are: Why do I get different lists in these two approaches? > Which approach gives me the best results when I look for specifically >>>>> upregulated genes in the 'a' population? Which approach is 'best' depends on the hypothesis you are trying to test. And it may still be impossible to say which is best, which is a fairly imprecise term. In my opinion it is more defensible to describe the analysis in terms of the hypothesis being tested, and why the particular model or contrast you used answers the underlying question. Best, Jim > >>>>> I'm still learning and especially learn a lot from you, so thanks for > your patience, Koen > > > Best, > > Jim > > >> Now, when looking at the one-on-one strategy list there are only five > genes >> common in the three groups with a B-value > 2, while in the group strategy >> there are 181 probe sets with a B-value > 2. >> >> Relevent code used: >> read all the cell files (a,b,c,d) >> > pd<-data.frame(population=c(rep(1,3),rep(2,8)),replicate=c(seq(1,3), seq(1,8) >> )) => group strategy >> or >> only read the .cel files of two populations (a,b or a,c or a,d) >> > pd<-data.frame(population=c(rep(1,3),rep(2,3)),replicate=c(seq(1,3), seq(1,3) >> )) => one-on-one strategy (repeated three times for each comparison) >> >> group<-factor(eset$population) >> design = model.matrix(~0+group) >> design >> cont.matrix = makeContrasts(eset = (group2 - group1), levels = design) >> cont.matrix >> >> >> Regards >> >> Koen >> >> -----Original Message----- >> From: James MacDonald [mailto:jmacdon at med.umich.edu] >> Sent: woensdag 12 mei 2010 4:40 >> To: Koen Marien >> Cc: 'Joseph Skaf'; bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] help: limma and changing gene results! >> >> Hi Koen, >> >> Koen Marien wrote: >>> Dear >>> >>> >>> Is this also the reason why there is a difference in the (differentially >>> expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? >> I am not familiar with the venny() function, so it's hard to say. But if >> you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, >> a-c, a-d, then no. >> >> In the first place, a-(b+c+d) isn't a contrast, and in most cases >> doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and >> tests the difference between the a group and the mean of the other >> three. The denominator will be the same in each case, being based on (in >> simple terms) the average variability of the four groups. >> >> However, if what I am assuming is correct, then the two contrasts are >> quite different, and shouldn't be expected to result in the same gene >> lists. As an example, say the mean of the groups for one gene are: >> >> a = 5 >> b = 2 >> c = 5 >> d = 8 >> >> since the denominator will be the same we can ignore that here. So do >> you think there will be a difference in what is called significant when >> we compare >> >> 5 - (2+5+8)/3 = 0 >> >> and >> >> 5 - 2 = 3 >> 5 - 5 = 0 >> 5 - 8 = -3 >> >> ? >> >> Best, >> >> Jim >> >> >>> a-(b+c+d): putting the b, c and d values in one >>> group (b+c+d) and using limma >>> venny(a-b,a-c,a-d): using limma on the separate groups and >>> create a list by looking at the intersection of de venn diagram of the >> three >>> 'sublists' a-b, a-c, a-d >>> >>> >>> Thanks a lot >>> >>> >>> Koen Marien >>> student bioscience engineering: cell and gene biotechnology >>> University of Ghent, Belgium >>> >>> >>> -----Original Message----- >>> From: bioconductor-bounces at stat.math.ethz.ch >>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. >>> MacDonald >>> Sent: donderdag 29 april 2010 18:46 >>> To: Joseph Skaf >>> Cc: bioconductor at stat.math.ethz.ch >>> Subject: Re: [BioC] help: limma and changing gene results! >>> >>> Hi Joseph, >>> >>> Joseph Skaf wrote: >>>> To whom it may concern, >>>> >>>> I've been having some problems with consistency in my limma results for >>>> genes that are found to have significant differential transcript >>> abundance. >>>> In a given example, I may have 4 different groups (a, b, c, and d) in an >>>> array set of 12. >>>> >>>> From here, I make a contrast matrix that has contrasts for a-b, a-c, and >>>> a-d. Eventually, I output an eBaye's corrected contrast fit and I use >>>> decideTests from there to find out what genes are differentially >>> expressed. >>>> My misunderstanding is that when I take away an entire group (such as >>>> removing all d's) and redo all steps in the limma analysis, I find that > I >>>> end up with a different set of genes after using decideTests. I am >>> confused >>>> here, because I would not think that removing group 'd' from the > analysis >>>> would have an effect on contrasts a-b and a-c. >>>> >>>> If anyone could even hint to me a reason as to why this is happening, it >>>> would be greatly appreciated. >>> It's because of how the denominator for your contrast is computed. The >>> denominator is computed using the intra-group variance for all the >>> groups in your study, not just the two groups being compared in the >>> contrast. >>> >>> So if you remove one of the groups, you lose both degrees of freedom as >>> well as the contribution from the intra-group variance of that group. >>> Losing the degrees of freedom will reduce your power to detect >>> differences. Losing the contribution of the intra-group variance will >>> depend on how variable the group d data are compared to groups a-c. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> Thanks and regards, >>>> Joseph Skaf >>>> >>>> >>>> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 14.5 years ago James W. MacDonald 67k

0

Entering edit mode

Koen Marien ▴ 30

@koen-marien-3918

Last seen 10.2 years ago

Dear Is this also the reason why there is a difference in the (differentially expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)? a-(b+c+d): putting the b, c and d values in one group (b+c+d) and using limma venny(a-b,a-c,a-d): using limma on the separate groups and create a list by looking at the intersection of de venn diagram of the three 'sublists' a-b, a-c, a-d Thanks a lot Koen Marien student bioscience engineering: cell and gene biotechnology University of Ghent, Belgium -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W. MacDonald Sent: donderdag 29 april 2010 18:46 To: Joseph Skaf Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] help: limma and changing gene results! Hi Joseph, Joseph Skaf wrote: > To whom it may concern, > > I've been having some problems with consistency in my limma results for > genes that are found to have significant differential transcript abundance. > > In a given example, I may have 4 different groups (a, b, c, and d) in an > array set of 12. > > From here, I make a contrast matrix that has contrasts for a-b, a-c, and > a-d. Eventually, I output an eBaye's corrected contrast fit and I use > decideTests from there to find out what genes are differentially expressed. > My misunderstanding is that when I take away an entire group (such as > removing all d's) and redo all steps in the limma analysis, I find that I > end up with a different set of genes after using decideTests. I am confused > here, because I would not think that removing group 'd' from the analysis > would have an effect on contrasts a-b and a-c. > > If anyone could even hint to me a reason as to why this is happening, it > would be greatly appreciated. It's because of how the denominator for your contrast is computed. The denominator is computed using the intra-group variance for all the groups in your study, not just the two groups being compared in the contrast. So if you remove one of the groups, you lose both degrees of freedom as well as the contribution from the intra-group variance of that group. Losing the degrees of freedom will reduce your power to detect differences. Losing the contribution of the intra-group variance will depend on how variable the group d data are compared to groups a-c. Best, Jim > > Thanks and regards, > Joseph Skaf > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.6 years ago Koen Marien ▴ 30

Login before adding your answer.