Question about interpretation of CHARM results

0

Entering edit mode

zeynep özkeserli ▴ 160

@zeynep-ozkeserli-5250

Last seen 10.4 years ago

Turkey

Dear All, Dear Dr. Aryee and Dr. Carvalho, I have a question on interpreting the results of dmrFinder function. We have performed a CHARM analysis on the data we got from NimbleGen Promoter Medip Arrays. The data is obtained from each patient before and after treatment. And after performing CHARM analysis, we got some differentially methylated regions (DMRs). As the samples are before and after treatment results of the same patient, the samples are treated as paired samples. My question is about interpretation of the results: After running this: dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), cutoff=0.995,paired=TRUE,pairs=pairs) to: before treatment ts: after treatment - For example I have found a DMR like this (I summerized the result for my question): chr 8, diff= -0.30427 and maxdiff=0.47935 As the diff value is calculated like this: average l (logit(percentage) methylation if l=NULL) difference within the DMR if paired=TRUE Is it true to say that: "The region has 0.30427 times the risk of being methylated in samples of after treatment compared to samples of before treatment." I know that it does not look meaningful to use the word "risk" when talking about something like that but I can not find a better way to say it truely. Is it possible to express it like a "0.30427 fold difference in methylation"? And also am I interpreting the "-" sign truely? Thank you for your help in advance, Best Regards, Zeynep [[alternative HTML version deleted]]

charm charm • 2.7k views

ADD COMMENT • link updated 12.7 years ago by Andrew Jaffe ▴ 120 • written 12.7 years ago by zeynep özkeserli ▴ 160

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 4.6 years ago

United States

Perhaps "on average this region has an R> 1 - exp(-0.347) [1] 0.2931947 approximately 29.3% relative decrease in cytosine methylation after treatment?" On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli < zeynep.ozkeserli@gmail.com> wrote: > Dear All, Dear Dr. Aryee and Dr. Carvalho, > > I have a question on interpreting the results of dmrFinder function. > > We have performed a CHARM analysis on the data we got from NimbleGen > Promoter Medip Arrays. The data is obtained from each patient before and > after treatment. And after performing CHARM analysis, we got some > differentially methylated regions (DMRs). > > As the samples are before and after treatment results of the same patient, > the samples are treated as paired samples. > > My question is about interpretation of the results: > > After running this: > > dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), > cutoff=0.995,paired=TRUE,pairs=pairs) > > to: before treatment > ts: after treatment > > - For example I have found a DMR like this (I summerized the result for my > question): > > chr 8, diff= -0.30427 and maxdiff=0.47935 > > As the diff value is calculated like this: average l (logit(percentage) > methylation if l=NULL) difference within the DMR if paired=TRUE > > Is it true to say that: "The region has 0.30427 times the risk of being > methylated in samples of after treatment compared to samples of before > treatment." > > I know that it does not look meaningful to use the word "risk" when talking > about something like that but I can not find a better way to say it > truely. Is it possible to express it like a "0.30427 fold difference in > methylation"? And also am I interpreting the "-" sign truely? > > Thank you for your help in advance, > > Best Regards, > > Zeynep > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Dear Tim, Thank you for your answer. But to my understanding, if I could get this answer by undoing the logit function (I tought you were doing this), we should use inverse logit function. Which is exp(x)/(1+exp(x)) And in my case it gives: > exp(-0.30427)/(1+exp(-0.30427)) [1] 0.424514 Ok, this seems reasonable. And it makes sense how you put this into words. But if we could use this one as a methylation measure, why would the creators make things more complicated and convert the value to a logit value? So, again, to my understanding, I shall learn how to interpret the diff thing. Thank you again, Best :) Zeynep On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Perhaps "on average this region has an > > R> 1 - exp(-0.347) > [1] 0.2931947 > > approximately 29.3% relative decrease in cytosine methylation after > treatment?" > > > > On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli < > zeynep.ozkeserli@gmail.com> wrote: > >> Dear All, Dear Dr. Aryee and Dr. Carvalho, >> >> I have a question on interpreting the results of dmrFinder function. >> >> We have performed a CHARM analysis on the data we got from NimbleGen >> Promoter Medip Arrays. The data is obtained from each patient before and >> after treatment. And after performing CHARM analysis, we got some >> differentially methylated regions (DMRs). >> >> As the samples are before and after treatment results of the same patient, >> the samples are treated as paired samples. >> >> My question is about interpretation of the results: >> >> After running this: >> >> dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), >> cutoff=0.995,paired=TRUE,pairs=pairs) >> >> to: before treatment >> ts: after treatment >> >> - For example I have found a DMR like this (I summerized the result for my >> question): >> >> chr 8, diff= -0.30427 and maxdiff=0.47935 >> >> As the diff value is calculated like this: average l (logit(percentage) >> methylation if l=NULL) difference within the DMR if paired=TRUE >> >> Is it true to say that: "The region has 0.30427 times the risk of being >> methylated in samples of after treatment compared to samples of before >> treatment." >> >> I know that it does not look meaningful to use the word "risk" when >> talking >> about something like that but I can not find a better way to say it >> truely. Is it possible to express it like a "0.30427 fold difference in >> methylation"? And also am I interpreting the "-" sign truely? >> >> Thank you for your help in advance, >> >> Best Regards, >> >> Zeynep >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago zeynep özkeserli ▴ 160

0

Entering edit mode

Good grief, I really need to avoid responding to emails before I have my morning coffee. --t On Aug 17, 2012, at 9:26 AM, zeynep Ã¶zkeserli <zeynep.ozkeserli@gmail.com> wrote: > Dear Tim, > > Thank you for your answer. But to my understanding, if I could get this answer by undoing the logit function (I tought you were doing this), we should use inverse logit function. Which is exp(x)/(1+exp(x)) > > And in my case it gives: > > > exp(-0.30427)/(1+exp(-0.30427)) > [1] 0.424514 > > Ok, this seems reasonable. And it makes sense how you put this into words. But if we could use this one as a methylation measure, why would the creators make things more complicated and convert the value to a logit value? So, again, to my understanding, I shall learn how to interpret the diff thing. > > Thank you again, > > Best :) > > Zeynep > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com> wrote: > Perhaps "on average this region has an > > R> 1 - exp(-0.347) > [1] 0.2931947 > > approximately 29.3% relative decrease in cytosine methylation after treatment?" > > > > On Fri, Aug 17, 2012 at 1:56 AM, zeynep Ã¶zkeserli <zeynep.ozkeserli@gmail.com> wrote: > Dear All, Dear Dr. Aryee and Dr. Carvalho, > > I have a question on interpreting the results of dmrFinder function. > > We have performed a CHARM analysis on the data we got from NimbleGen > Promoter Medip Arrays. The data is obtained from each patient before and > after treatment. And after performing CHARM analysis, we got some > differentially methylated regions (DMRs). > > As the samples are before and after treatment results of the same patient, > the samples are treated as paired samples. > > My question is about interpretation of the results: > > After running this: > > dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), > cutoff=0.995,paired=TRUE,pairs=pairs) > > to: before treatment > ts: after treatment > > - For example I have found a DMR like this (I summerized the result for my > question): > > chr 8, diff= -0.30427 and maxdiff=0.47935 > > As the diff value is calculated like this: average l (logit(percentage) > methylation if l=NULL) difference within the DMR if paired=TRUE > > Is it true to say that: "The region has 0.30427 times the risk of being > methylated in samples of after treatment compared to samples of before > treatment." > > I know that it does not look meaningful to use the word "risk" when talking > about something like that but I can not find a better way to say it > truely. Is it possible to express it like a "0.30427 fold difference in > methylation"? And also am I interpreting the "-" sign truely? > > Thank you for your help in advance, > > Best Regards, > > Zeynep > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper > > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Zeynep- You also mentioned that you are using MeDip arrays, and if that also means that you used the MeDip protocol to fractionate your DNA, keep in mind that the CHARM package is written with the assumption that the restriction enzyme McrBC is used to fractionate the DNA into total and unmethylated fractions, whereas MeDip would produce total and methylated fractions. This issue is addressed in https://stat.ethz.ch/pipermail/bioconductor/2012-June/046364.html thread. Just be cautious using MeDip data with the CHARM package, as it might result in incorrect methylation estimation. Brian On Fri, Aug 17, 2012 at 1:00 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Good grief, I really need to avoid responding to emails before I have my > morning coffee. > > --t > > On Aug 17, 2012, at 9:26 AM, zeynep özkeserli <zeynep.ozkeserli@gmail.com> > wrote: > > > Dear Tim, > > > > Thank you for your answer. But to my understanding, if I could get this > answer by undoing the logit function (I tought you were doing this), we > should use inverse logit function. Which is exp(x)/(1+exp(x)) > > > > And in my case it gives: > > > > > exp(-0.30427)/(1+exp(-0.30427)) > > [1] 0.424514 > > > > Ok, this seems reasonable. And it makes sense how you put this into > words. But if we could use this one as a methylation measure, why would the > creators make things more complicated and convert the value to a logit > value? So, again, to my understanding, I shall learn how to interpret the > diff thing. > > > > Thank you again, > > > > Best :) > > > > Zeynep > > > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com> > wrote: > > Perhaps "on average this region has an > > > > R> 1 - exp(-0.347) > > [1] 0.2931947 > > > > approximately 29.3% relative decrease in cytosine methylation after > treatment?" > > > > > > > > On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli < > zeynep.ozkeserli@gmail.com> wrote: > > Dear All, Dear Dr. Aryee and Dr. Carvalho, > > > > I have a question on interpreting the results of dmrFinder function. > > > > We have performed a CHARM analysis on the data we got from NimbleGen > > Promoter Medip Arrays. The data is obtained from each patient before and > > after treatment. And after performing CHARM analysis, we got some > > differentially methylated regions (DMRs). > > > > As the samples are before and after treatment results of the same > patient, > > the samples are treated as paired samples. > > > > My question is about interpretation of the results: > > > > After running this: > > > > dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), > > cutoff=0.995,paired=TRUE,pairs=pairs) > > > > to: before treatment > > ts: after treatment > > > > - For example I have found a DMR like this (I summerized the result for > my > > question): > > > > chr 8, diff= -0.30427 and maxdiff=0.47935 > > > > As the diff value is calculated like this: average l (logit(percentage) > > methylation if l=NULL) difference within the DMR if paired=TRUE > > > > Is it true to say that: "The region has 0.30427 times the risk of being > > methylated in samples of after treatment compared to samples of before > > treatment." > > > > I know that it does not look meaningful to use the word "risk" when > talking > > about something like that but I can not find a better way to say it > > truely. Is it possible to express it like a "0.30427 fold difference in > > methylation"? And also am I interpreting the "-" sign truely? > > > > Thank you for your help in advance, > > > > Best Regards, > > > > Zeynep > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > > A model is a lie that helps you see the truth. > > > > Howard Skipper > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Brian Herb Graduate Program in Biochemistry, Cellular and Molecular Biology Johns Hopkins School of Medicine Dr. Andrew Feinberg Laboratory Rangos 580 855 N. Wolfe St. Baltimore, MD 21205 Phone:410-614-3478 Fax: 410-614-9819 [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Brian Herb ▴ 80

0

Entering edit mode

No Tim, it was important. Thank you again. On Fri, Aug 17, 2012 at 8:00 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > Good grief, I really need to avoid responding to emails before I have my > morning coffee. > > --t > > On Aug 17, 2012, at 9:26 AM, zeynep özkeserli <zeynep.ozkeserli@gmail.com> > wrote: > > Dear Tim, > > Thank you for your answer. But to my understanding, if I could get this > answer by undoing the logit function (I tought you were doing this), we > should use inverse logit function. Which is exp(x)/(1+exp(x)) > > And in my case it gives: > > > exp(-0.30427)/(1+exp(-0.30427)) > [1] 0.424514 > > Ok, this seems reasonable. And it makes sense how you put this into words. > But if we could use this one as a methylation measure, why would the > creators make things more complicated and convert the value to a logit > value? So, again, to my understanding, I shall learn how to interpret the > diff thing. > > Thank you again, > > Best :) > > Zeynep > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> Perhaps "on average this region has an >> >> R> 1 - exp(-0.347) >> [1] 0.2931947 >> >> approximately 29.3% relative decrease in cytosine methylation after >> treatment?" >> >> >> >> On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli < >> zeynep.ozkeserli@gmail.com> wrote: >> >>> Dear All, Dear Dr. Aryee and Dr. Carvalho, >>> >>> I have a question on interpreting the results of dmrFinder function. >>> >>> We have performed a CHARM analysis on the data we got from NimbleGen >>> Promoter Medip Arrays. The data is obtained from each patient before and >>> after treatment. And after performing CHARM analysis, we got some >>> differentially methylated regions (DMRs). >>> >>> As the samples are before and after treatment results of the same >>> patient, >>> the samples are treated as paired samples. >>> >>> My question is about interpretation of the results: >>> >>> After running this: >>> >>> dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), >>> cutoff=0.995,paired=TRUE,pairs=pairs) >>> >>> to: before treatment >>> ts: after treatment >>> >>> - For example I have found a DMR like this (I summerized the result for >>> my >>> question): >>> >>> chr 8, diff= -0.30427 and maxdiff=0.47935 >>> >>> As the diff value is calculated like this: average l (logit(percentage) >>> methylation if l=NULL) difference within the DMR if paired=TRUE >>> >>> Is it true to say that: "The region has 0.30427 times the risk of being >>> methylated in samples of after treatment compared to samples of before >>> treatment." >>> >>> I know that it does not look meaningful to use the word "risk" when >>> talking >>> about something like that but I can not find a better way to say it >>> truely. Is it possible to express it like a "0.30427 fold difference in >>> methylation"? And also am I interpreting the "-" sign truely? >>> >>> Thank you for your help in advance, >>> >>> Best Regards, >>> >>> Zeynep >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago zeynep özkeserli ▴ 160

0

Entering edit mode

The reason to switch from a proportion (%, beta-value, whichever; anything measuring M / (M+U) where M and U are surrogates for methylated and unmethylated cytosines) to a fold-change (logit(proportion.methylated) or log2(M/U)) is that the latter is far more amenable to linear models, and roughly parallels the expected behavior in terms of expression changes on a log2 or log-fold-change scale. Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is appropriate when you are modeling something as having Gaussian error. Something with a range of 0 to 1 is neither homoskedastic (which is to say, such a 0-1 measurement will have a variance that depends on the mean) nor unbounded (this turns out to be an issue when computing maximum likelihood estimates, for example, as values close to the boundary will cause problems). In any event, logit(% methylation) is equivalent to log(M/U) which is where I veered off course this morning. My brain seems to have been a bit slow. On Fri, Aug 17, 2012 at 9:26 AM, zeynep özkeserli < zeynep.ozkeserli@gmail.com> wrote: > Dear Tim, > > Thank you for your answer. But to my understanding, if I could get this > answer by undoing the logit function (I tought you were doing this), we > should use inverse logit function. Which is exp(x)/(1+exp(x)) > > And in my case it gives: > > > exp(-0.30427)/(1+exp(-0.30427)) > [1] 0.424514 > > Ok, this seems reasonable. And it makes sense how you put this into words. > But if we could use this one as a methylation measure, why would the > creators make things more complicated and convert the value to a logit > value? So, again, to my understanding, I shall learn how to interpret the > diff thing. > > Thank you again, > > Best :) > > Zeynep > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > >> Perhaps "on average this region has an >> >> R> 1 - exp(-0.347) >> [1] 0.2931947 >> >> approximately 29.3% relative decrease in cytosine methylation after >> treatment?" >> >> >> >> On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli < >> zeynep.ozkeserli@gmail.com> wrote: >> >>> Dear All, Dear Dr. Aryee and Dr. Carvalho, >>> >>> I have a question on interpreting the results of dmrFinder function. >>> >>> We have performed a CHARM analysis on the data we got from NimbleGen >>> Promoter Medip Arrays. The data is obtained from each patient before and >>> after treatment. And after performing CHARM analysis, we got some >>> differentially methylated regions (DMRs). >>> >>> As the samples are before and after treatment results of the same >>> patient, >>> the samples are treated as paired samples. >>> >>> My question is about interpretation of the results: >>> >>> After running this: >>> >>> dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), >>> cutoff=0.995,paired=TRUE,pairs=pairs) >>> >>> to: before treatment >>> ts: after treatment >>> >>> - For example I have found a DMR like this (I summerized the result for >>> my >>> question): >>> >>> chr 8, diff= -0.30427 and maxdiff=0.47935 >>> >>> As the diff value is calculated like this: average l (logit(percentage) >>> methylation if l=NULL) difference within the DMR if paired=TRUE >>> >>> Is it true to say that: "The region has 0.30427 times the risk of being >>> methylated in samples of after treatment compared to samples of before >>> treatment." >>> >>> I know that it does not look meaningful to use the word "risk" when >>> talking >>> about something like that but I can not find a better way to say it >>> truely. Is it possible to express it like a "0.30427 fold difference in >>> methylation"? And also am I interpreting the "-" sign truely? >>> >>> Thank you for your help in advance, >>> >>> Best Regards, >>> >>> Zeynep >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> >> > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

I understand all the statistical reasons for converting from methylation "beta values" to something logistic, and am frequently tempted to do this myself. But I think in the context of methylation that this advice should come with a warning: changes in levels near 0 and 1 may have a lot of leverage on the final results. For example, we have done analyses on some of the TCGA data where we find "statistically significant differences in methylation between normal and tumor" where the mean beta values are 0.03 and 0.08. I find it hard to believe that this level of change in methylation has any kind of biological meaning. In fact, I'm not even convinced that we can accurately measure this amount of change using the technology that TCGA is using (although I might well believe that such a change could result from batch effects, whether in the assay or in the data processing). I don't have any magic solution to fix this issue; it is intrinsic in the shape of the logistic curve. One might want to explore shrinking the beta values toward 0.5 (i.e., away from 0 and 1), but I can't offer any concrete advice on how well this might work in practice. Best, Kevin On 8/17/2012 12:36 PM, Tim Triche, Jr. wrote: > The reason to switch from a proportion (%, beta-value, whichever; anything > measuring M / (M+U) where M and U are surrogates for methylated and > unmethylated cytosines) to a fold-change (logit(proportion.methylated) or > log2(M/U)) is that the latter is far more amenable to linear models, and > roughly parallels the expected behavior in terms of expression changes on a > log2 or log-fold-change scale. > > Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is > appropriate when you are modeling something as having Gaussian error. > Something with a range of 0 to 1 is neither homoskedastic (which is to > say, such a 0-1 measurement will have a variance that depends on the mean) > nor unbounded (this turns out to be an issue when computing maximum > likelihood estimates, for example, as values close to the boundary will > cause problems). > > In any event, logit(% methylation) is equivalent to log(M/U) which is where > I veered off course this morning. My brain seems to have been a bit slow. > > > On Fri, Aug 17, 2012 at 9:26 AM, zeynep özkeserli< > zeynep.ozkeserli@gmail.com> wrote: > >> Dear Tim, >> >> Thank you for your answer. But to my understanding, if I could get this >> answer by undoing the logit function (I tought you were doing this), we >> should use inverse logit function. Which is exp(x)/(1+exp(x)) >> >> And in my case it gives: >> >>> exp(-0.30427)/(1+exp(-0.30427)) >> [1] 0.424514 >> >> Ok, this seems reasonable. And it makes sense how you put this into words. >> But if we could use this one as a methylation measure, why would the >> creators make things more complicated and convert the value to a logit >> value? So, again, to my understanding, I shall learn how to interpret the >> diff thing. >> >> Thank you again, >> >> Best :) >> >> Zeynep >> >> On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr.<tim.triche@gmail.com>wrote: >> >>> Perhaps "on average this region has an >>> >>> R> 1 - exp(-0.347) >>> [1] 0.2931947 >>> >>> approximately 29.3% relative decrease in cytosine methylation after >>> treatment?" >>> >>> >>> >>> On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli< >>> zeynep.ozkeserli@gmail.com> wrote: >>> >>>> Dear All, Dear Dr. Aryee and Dr. Carvalho, >>>> >>>> I have a question on interpreting the results of dmrFinder function. >>>> >>>> We have performed a CHARM analysis on the data we got from NimbleGen >>>> Promoter Medip Arrays. The data is obtained from each patient before and >>>> after treatment. And after performing CHARM analysis, we got some >>>> differentially methylated regions (DMRs). >>>> >>>> As the samples are before and after treatment results of the same >>>> patient, >>>> the samples are treated as paired samples. >>>> >>>> My question is about interpretation of the results: >>>> >>>> After running this: >>>> >>>> dmr1_2<- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), >>>> cutoff=0.995,paired=TRUE,pairs=pairs) >>>> >>>> to: before treatment >>>> ts: after treatment >>>> >>>> - For example I have found a DMR like this (I summerized the result for >>>> my >>>> question): >>>> >>>> chr 8, diff= -0.30427 and maxdiff=0.47935 >>>> >>>> As the diff value is calculated like this: average l (logit(percentage) >>>> methylation if l=NULL) difference within the DMR if paired=TRUE >>>> >>>> Is it true to say that: "The region has 0.30427 times the risk of being >>>> methylated in samples of after treatment compared to samples of before >>>> treatment." >>>> >>>> I know that it does not look meaningful to use the word "risk" when >>>> talking >>>> about something like that but I can not find a better way to say it >>>> truely. Is it possible to express it like a "0.30427 fold difference in >>>> methylation"? And also am I interpreting the "-" sign truely? >>>> >>>> Thank you for your help in advance, >>>> >>>> Best Regards, >>>> >>>> Zeynep >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >>> -- >>> *A model is a lie that helps you see the truth.* >>> * >>> * >>> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >>> >>> > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Kevin Coombes ▴ 430

0

Entering edit mode

(a bit late, but better late than never) Dr. Coombes is right, of course, and I ought to have mentioned this earlier. It is a thorny issue, especially with epidemiological studies. Cancer, less so. For differences that are real, usually they are significant on the beta value scale *and* the logistic scale. Changes that are only significant on the logit scale are often artifacts. The logistic scale is much more sensitive to technical artifacts, even if you damp the extremes by using a huge offset. There are intermediate transformations (probit, arcsin(sqrt(x)), etc.) which are intermediate between proportional and logistic, but they haven't found much traction. There are a couple of things that do help and can be used to filter an unbiased fashion. I'm hoping to get them submitted in the very near future. I do not know if they will be as useful for CHARM data as they seem to be for Illumina arrays, eRRBS, and WGSBS sequence data, but I guess we shall find out. regarding TCGA data: TCGA methylation data is background corrected and dye bias equalized (for the 450k samples, at least, and as batches are updated, 27k as well) but no batch correction is done for the level 3 data. In the case of multi- batch tumors it is a good idea to run ComBat or (if you must) SVA to compensate. It's run through normalizeMethyLumiSet(methylumi.bgcorr(the.data)) for level 2 and 3 data, and the IDAT files are provided as level 1, so anyone who wants to reproduce things from scratch with a different preprocessing strategy is welcome to do so. Methylumi and minfi happen to have the same IDAT parsing code these days, and I would not be surprised if they eventually merged. I wouldn't bother using anything else, especially for large volumes of samples. switching from 0.1% methylated to 99.9% methylated is probably a real effect. Switching from 1% to 3% across the board is probably technical artifacts. You will see this all the time on 450k data that hasn't been (ahem) properly background corrected, and to a significant degree in 27k data as well. It's not limited to TCGA; I've seen plenty of data from other centers that benefited significantly from being re-processed sensibly. In any event, I pushed for, and got, changes to policy so that Illumina methylation data for TCGA is provided as raw IDAT files, and all of the code used for processing is available either from the Bioconductor package repository or from GitHub (in the case of the packaging pipeline itself). It's not perfect but at least it is 100% transparent. one last thing: The Illumina annotations have been updated to a FeatureDb (one for hg19, and another available soon for hg18) which I would like to propose as the standard for these arrays, as they cover both 27k and 450k features (along with some information that very few people seem to know about each). I think it's as fast as what minfi uses and as comprehensive as the .db0 packages, while less confusing than either. So anyone who wants to, please try them. On Fri, Aug 17, 2012 at 1:05 PM, Kevin R. Coombes <kevin.r.coombes@gmail.com> wrote: > I understand all the statistical reasons for converting from methylation > "beta values" to something logistic, and am frequently tempted to do this > myself. > > But I think in the context of methylation that this advice should come > with a warning: changes in levels near 0 and 1 may have a lot of leverage > on the final results. For example, we have done analyses on some of the > TCGA data where we find "statistically significant differences in > methylation between normal and tumor" where the mean beta values are 0.03 > and 0.08. I find it hard to believe that this level of change in > methylation has any kind of biological meaning. In fact, I'm not even > convinced that we can accurately measure this amount of change using the > technology that TCGA is using (although I might well believe that such a > change could result from batch effects, whether in the assay or in the data > processing). > > I don't have any magic solution to fix this issue; it is intrinsic in the > shape of the logistic curve. One might want to explore shrinking the beta > values toward 0.5 (i.e., away from 0 and 1), but I can't offer any concrete > advice on how well this might work in practice. > > Best, > Kevin > > > On 8/17/2012 12:36 PM, Tim Triche, Jr. wrote: > > The reason to switch from a proportion (%, beta-value, whichever; anything > measuring M / (M+U) where M and U are surrogates for methylated and > unmethylated cytosines) to a fold-change (logit(proportion.methylated) or > log2(M/U)) is that the latter is far more amenable to linear models, and > roughly parallels the expected behavior in terms of expression changes on a > log2 or log-fold-change scale. > > Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is > appropriate when you are modeling something as having Gaussian error. > Something with a range of 0 to 1 is neither homoskedastic (which is to > say, such a 0-1 measurement will have a variance that depends on the mean) > nor unbounded (this turns out to be an issue when computing maximum > likelihood estimates, for example, as values close to the boundary will > cause problems). > > In any event, logit(% methylation) is equivalent to log(M/U) which is where > I veered off course this morning. My brain seems to have been a bit slow. > > > On Fri, Aug 17, 2012 at 9:26 AM, zeynep özkeserli <zeynep.ozkeserli@gmail.com> wrote: > > > Dear Tim, > > Thank you for your answer. But to my understanding, if I could get this > answer by undoing the logit function (I tought you were doing this), we > should use inverse logit function. Which is exp(x)/(1+exp(x)) > > And in my case it gives: > > > exp(-0.30427)/(1+exp(-0.30427)) > > [1] 0.424514 > > Ok, this seems reasonable. And it makes sense how you put this into words. > But if we could use this one as a methylation measure, why would the > creators make things more complicated and convert the value to a logit > value? So, again, to my understanding, I shall learn how to interpret the > diff thing. > > Thank you again, > > Best :) > > Zeynep > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@gmail.com> <tim.triche@gmail.com>wrote: > > > Perhaps "on average this region has an > > R> 1 - exp(-0.347) > [1] 0.2931947 > > approximately 29.3% relative decrease in cytosine methylation after > treatment?" > > > > On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli <zeynep.ozkeserli@gmail.com> wrote: > > > Dear All, Dear Dr. Aryee and Dr. Carvalho, > > I have a question on interpreting the results of dmrFinder function. > > We have performed a CHARM analysis on the data we got from NimbleGen > Promoter Medip Arrays. The data is obtained from each patient before and > after treatment. And after performing CHARM analysis, we got some > differentially methylated regions (DMRs). > > As the samples are before and after treatment results of the same > patient, > the samples are treated as paired samples. > > My question is about interpretation of the results: > > After running this: > > dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), > cutoff=0.995,paired=TRUE,pairs=pairs) > > to: before treatment > ts: after treatment > > - For example I have found a DMR like this (I summerized the result for > my > question): > > chr 8, diff= -0.30427 and maxdiff=0.47935 > > As the diff value is calculated like this: average l (logit(percentage) > methylation if l=NULL) difference within the DMR if paired=TRUE > > Is it true to say that: "The region has 0.30427 times the risk of being > methylated in samples of after treatment compared to samples of before > treatment." > > I know that it does not look meaningful to use the word "risk" when > talking > about something like that but I can not find a better way to say it > truely. Is it possible to express it like a "0.30427 fold difference in > methylation"? And also am I interpreting the "-" sign truely? > > Thank you for your help in advance, > > Best Regards, > > Zeynep > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing listBioconductor@r-project.orghttps://stat.ethz .ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.info rmatics.conductor > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > > > _______________________________________________ > Bioconductor mailing listBioconductor@r-project.orghttps://stat.ethz .ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Sun, Aug 19, 2012 at 2:21 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > (a bit late, but better late than never) > > Dr. Coombes is right, of course, and I ought to have mentioned this > earlier. It is a thorny issue, especially with epidemiological studies. > Cancer, less so. > > For differences that are real, usually they are significant on the beta > value scale *and* the logistic scale. Changes that are only significant on > the logit scale are often artifacts. The logistic scale is much more > sensitive to technical artifacts, even if you damp the extremes by using a > huge offset. There are intermediate transformations (probit, > arcsin(sqrt(x)), etc.) which are intermediate between proportional and > logistic, but they haven't found much traction. > > There are a couple of things that do help and can be used to filter an > unbiased fashion. I'm hoping to get them submitted in the very near > future. I do not know if they will be as useful for CHARM data as they > seem to be for Illumina arrays, eRRBS, and WGSBS sequence data, but I guess > we shall find out. > > regarding TCGA data: > > TCGA methylation data is background corrected and dye bias equalized (for > the 450k samples, at least, and as batches are updated, 27k as well) but no > batch correction is done for the level 3 data. In the case of multi-batch > tumors it is a good idea to run ComBat or (if you must) SVA to compensate. Sorry to hijack the thread, but, what is the reason to prefer ComBat over SVA? > It's run through normalizeMethyLumiSet(methylumi.bgcorr(the.data)) for > level 2 and 3 data, and the IDAT files are provided as level 1, so anyone > who wants to reproduce things from scratch with a different preprocessing > strategy is welcome to do so. Methylumi and minfi happen to have the same > IDAT parsing code these days, and I would not be surprised if they > eventually merged. I wouldn't bother using anything else, especially for > large volumes of samples. > > switching from 0.1% methylated to 99.9% methylated is probably a real > effect. Switching from 1% to 3% across the board is probably technical > artifacts. I'm guessing this to be true only for tumor/normal comparisons or "pure" samples. What about peripheral blood where one may be measuring a signal from a variety of cell types or tissues? > You will see this all the time on 450k data that hasn't been > (ahem) properly background corrected, and to a significant degree in 27k > data as well. It's not limited to TCGA; I've seen plenty of data from > other centers that benefited significantly from being re-processed > sensibly. In any event, I pushed for, and got, changes to policy so that > Illumina methylation data for TCGA is provided as raw IDAT files, and all > of the code used for processing is available either from the Bioconductor > package repository or from GitHub (in the case of the packaging pipeline > itself). It's not perfect but at least it is 100% transparent. > > one last thing: > > The Illumina annotations have been updated to a FeatureDb (one for hg19, > and another available soon for hg18) which I would like to propose as the > standard for these arrays, as they cover both 27k and 450k features (along > with some information that very few people seem to know about each). I > think it's as fast as what minfi uses and as comprehensive as the .db0 > packages, while less confusing than either. So anyone who wants to, please > try them. > > > > On Fri, Aug 17, 2012 at 1:05 PM, Kevin R. Coombes <kevin.r.coombes at="" gmail.com="">> wrote: > >> I understand all the statistical reasons for converting from methylation >> "beta values" to something logistic, and am frequently tempted to do this >> myself. >> >> But I think in the context of methylation that this advice should come >> with a warning: changes in levels near 0 and 1 may have a lot of leverage >> on the final results. For example, we have done analyses on some of the >> TCGA data where we find "statistically significant differences in >> methylation between normal and tumor" where the mean beta values are 0.03 >> and 0.08. I find it hard to believe that this level of change in >> methylation has any kind of biological meaning. In fact, I'm not even >> convinced that we can accurately measure this amount of change using the >> technology that TCGA is using (although I might well believe that such a >> change could result from batch effects, whether in the assay or in the data >> processing). >> >> I don't have any magic solution to fix this issue; it is intrinsic in the >> shape of the logistic curve. One might want to explore shrinking the beta >> values toward 0.5 (i.e., away from 0 and 1), but I can't offer any concrete >> advice on how well this might work in practice. >> >> Best, >> Kevin >> >> >> On 8/17/2012 12:36 PM, Tim Triche, Jr. wrote: >> >> The reason to switch from a proportion (%, beta-value, whichever; anything >> measuring M / (M+U) where M and U are surrogates for methylated and >> unmethylated cytosines) to a fold-change (logit(proportion.methylated) or >> log2(M/U)) is that the latter is far more amenable to linear models, and >> roughly parallels the expected behavior in terms of expression changes on a >> log2 or log-fold-change scale. >> >> Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is >> appropriate when you are modeling something as having Gaussian error. >> Something with a range of 0 to 1 is neither homoskedastic (which is to >> say, such a 0-1 measurement will have a variance that depends on the mean) >> nor unbounded (this turns out to be an issue when computing maximum >> likelihood estimates, for example, as values close to the boundary will >> cause problems). >> >> In any event, logit(% methylation) is equivalent to log(M/U) which is where >> I veered off course this morning. My brain seems to have been a bit slow. >> >> >> On Fri, Aug 17, 2012 at 9:26 AM, zeynep ?zkeserli <zeynep.ozkeserli at="" gmail.com=""> wrote: >> >> >> Dear Tim, >> >> Thank you for your answer. But to my understanding, if I could get this >> answer by undoing the logit function (I tought you were doing this), we >> should use inverse logit function. Which is exp(x)/(1+exp(x)) >> >> And in my case it gives: >> >> >> exp(-0.30427)/(1+exp(-0.30427)) >> >> [1] 0.424514 >> >> Ok, this seems reasonable. And it makes sense how you put this into words. >> But if we could use this one as a methylation measure, why would the >> creators make things more complicated and convert the value to a logit >> value? So, again, to my understanding, I shall learn how to interpret the >> diff thing. >> >> Thank you again, >> >> Best :) >> >> Zeynep >> >> On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> <tim.triche at="" gmail.com="">wrote: >> >> >> Perhaps "on average this region has an >> >> R> 1 - exp(-0.347) >> [1] 0.2931947 >> >> approximately 29.3% relative decrease in cytosine methylation after >> treatment?" >> >> >> >> On Fri, Aug 17, 2012 at 1:56 AM, zeynep ?zkeserli <zeynep.ozkeserli at="" gmail.com=""> wrote: >> >> >> Dear All, Dear Dr. Aryee and Dr. Carvalho, >> >> I have a question on interpreting the results of dmrFinder function. >> >> We have performed a CHARM analysis on the data we got from NimbleGen >> Promoter Medip Arrays. The data is obtained from each patient before and >> after treatment. And after performing CHARM analysis, we got some >> differentially methylated regions (DMRs). >> >> As the samples are before and after treatment results of the same >> patient, >> the samples are treated as paired samples. >> >> My question is about interpretation of the results: >> >> After running this: >> >> dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), >> cutoff=0.995,paired=TRUE,pairs=pairs) >> >> to: before treatment >> ts: after treatment >> >> - For example I have found a DMR like this (I summerized the result for >> my >> question): >> >> chr 8, diff= -0.30427 and maxdiff=0.47935 >> >> As the diff value is calculated like this: average l (logit(percentage) >> methylation if l=NULL) difference within the DMR if paired=TRUE >> >> Is it true to say that: "The region has 0.30427 times the risk of being >> methylated in samples of after treatment compared to samples of before >> treatment." >> >> I know that it does not look meaningful to use the word "risk" when >> talking >> about something like that but I can not find a better way to say it >> truely. Is it possible to express it like a "0.30427 fold difference in >> methylation"? And also am I interpreting the "-" sign truely? >> >> Thank you for your help in advance, >> >> Best Regards, >> >> Zeynep >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives:http://news.gmane.org/gmane.science.biology.inf ormatics.conductor >> >> -- >> *A model is a lie that helps you see the truth.* >> * >> * >> Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> <http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> >> >> >> >> _______________________________________________ >> Bioconductor mailing listBioconductor at r-project.orghttps://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 12.7 years ago Brent Pedersen ▴ 110

0

Entering edit mode

On Mon, Aug 20, 2012 at 10:42 AM, Brent Pedersen <bpederse@gmail.com> wrote: > > > TCGA methylation data is background corrected and dye bias equalized (for > > the 450k samples, at least, and as batches are updated, 27k as well) but > no > > batch correction is done for the level 3 data. In the case of > multi-batch > > tumors it is a good idea to run ComBat or (if you must) SVA to > compensate. > > Sorry to hijack the thread, but, what is the reason to prefer ComBat over > SVA? Because in practice, with calibration samples, it seems to work better. > switching from 0.1% methylated to 99.9% methylated is probably a real > > effect. Switching from 1% to 3% across the board is probably technical > > artifacts. > > I'm guessing this to be true only for tumor/normal comparisons or > "pure" samples. > Yes, the former typically have distinct cancer-related (vs. tissue- related) changes if any, and the latter are a bit like unicorn poop (never seen in the wild). http://www.nature.com/nbt/journal/v30/n5/full/nbt.2203.html?WT.ec_id=N BT-201205#/methods > What about peripheral blood where one may be measuring a signal from a > variety of cell types or tissues? > Funny you should mention this particular task :-) http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.004 1361 Given the difficulty of isolating gold standard reference populations by flow sorting, it's tough to benchmark the various transformations, but what you gain in linearity you may lose in leverage. Since there isn't one particular transformation that simultaneously linearizes and stabilizes a proportion, http://www.jstor.org/discover/10.2307/1269291?uid=2129&uid=2&uid=70&ui d=4&sid=21101141681241 you have to pick your battles. In the case of compositional analysis, 30+ years after Aitchison and Shen's seminal papers, it appears to remain unresolved. The ability to isolate a small number of highly purified cells and perform targeted BS-seq on picogram quantities of DNA may put this to rest. http://leg.est.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:thestatis ticalanalysisofcompositionaldata.pdf However... joint analysis via DNA methylation and expression (array or RNAseq) is another matter, and there I have a candidate (in need of validation). I can't say that I'm entirely unhappy about you 'hijacking' this thread... --t [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Mon, Aug 20, 2012 at 12:28 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > > > On Mon, Aug 20, 2012 at 10:42 AM, Brent Pedersen <bpederse at="" gmail.com=""> wrote: >> >> >> > TCGA methylation data is background corrected and dye bias equalized >> > (for >> > the 450k samples, at least, and as batches are updated, 27k as well) but >> > no >> > batch correction is done for the level 3 data. In the case of >> > multi-batch >> > tumors it is a good idea to run ComBat or (if you must) SVA to >> > compensate. >> >> Sorry to hijack the thread, but, what is the reason to prefer ComBat over >> SVA? > > > Because in practice, with calibration samples, it seems to work better. > > >> > switching from 0.1% methylated to 99.9% methylated is probably a real >> > effect. Switching from 1% to 3% across the board is probably technical >> > artifacts. >> >> I'm guessing this to be true only for tumor/normal comparisons or >> "pure" samples. > > > Yes, the former typically have distinct cancer-related (vs. tissue- related) > changes if any, and the latter are a bit like unicorn poop (never seen in > the wild). > > http://www.nature.com/nbt/journal/v30/n5/full/nbt.2203.html?WT.ec_id =NBT-201205#/methods > > >> >> What about peripheral blood where one may be measuring a signal from a >> variety of cell types or tissues? > > > Funny you should mention this particular task :-) > > http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0 041361 > thanks for this reference, I hadn't seen it. Interesting to read that after this study: http://ajrccm.atsjournals.org/content/185/4/373.long with tiny fold-changes that replicate across populations: http://ajrccm.atsjournals.org/content/185/4/373/F3.large.jpg > Given the difficulty of isolating gold standard reference populations by > flow sorting, it's tough to benchmark the various transformations, but what > you gain in linearity you may lose in leverage. Since there isn't one > particular transformation that simultaneously linearizes and stabilizes a > proportion, > > http://www.jstor.org/discover/10.2307/1269291?uid=2129&uid=2&uid=70& uid=4&sid=21101141681241 > > you have to pick your battles. In the case of compositional analysis, 30+ > years after Aitchison and Shen's seminal papers, it appears to remain > unresolved. The ability to isolate a small number of highly purified cells > and perform targeted BS-seq on picogram quantities of DNA may put this to > rest. > > http://leg.est.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:thestat isticalanalysisofcompositionaldata.pdf > > However... joint analysis via DNA methylation and expression (array or > RNAseq) is another matter, and there I have a candidate (in need of > validation). > I'll keep an eye out for that. -b > > I can't say that I'm entirely unhappy about you 'hijacking' this thread... > > --t > >

ADD REPLY • link 12.7 years ago Brent Pedersen ▴ 110

0

Entering edit mode

I have the utmost respect for Andrea Bacarelli, Vince Carey, and their colleagues, all of whom are extremely careful and methodical investigators. That said, I cannot help but wonder (based on my own data) whether characteristic differences in leukocyte populations, age, or chronic inflammation might be responsible for some of the observed population- level effect (given the large size of the cohort) in mixed blood cell populations. Then again, another possibility is that genuinely variable methylation regions are being pushed in one direction or another specifically as a result of COPD and its progression. Changes in blood cell populations themselves appear to be relatively small but replicable over the course of differentiation, with some notable exceptions. Consequently, I prefer to work with sorted cells for assessing DNA methylation markers whenever possible. When that is impossible, joint estimation of sample composition can be an attractive alternative. Nonetheless, flow sorting really ought not to break the bank in large studies like these (IMHO). Best, --t On Mon, Aug 20, 2012 at 1:29 PM, Brent Pedersen <bpederse@gmail.com> wrote: > On Mon, Aug 20, 2012 at 12:28 PM, Tim Triche, Jr. <tim.triche@gmail.com> > wrote: > > > > > > On Mon, Aug 20, 2012 at 10:42 AM, Brent Pedersen <bpederse@gmail.com> > wrote: > >> > >> > >> > TCGA methylation data is background corrected and dye bias equalized > >> > (for > >> > the 450k samples, at least, and as batches are updated, 27k as well) > but > >> > no > >> > batch correction is done for the level 3 data. In the case of > >> > multi-batch > >> > tumors it is a good idea to run ComBat or (if you must) SVA to > >> > compensate. > >> > >> Sorry to hijack the thread, but, what is the reason to prefer ComBat > over > >> SVA? > > > > > > Because in practice, with calibration samples, it seems to work better. > > > > > >> > switching from 0.1% methylated to 99.9% methylated is probably a real > >> > effect. Switching from 1% to 3% across the board is probably > technical > >> > artifacts. > >> > >> I'm guessing this to be true only for tumor/normal comparisons or > >> "pure" samples. > > > > > > Yes, the former typically have distinct cancer-related (vs. > tissue-related) > > changes if any, and the latter are a bit like unicorn poop (never seen in > > the wild). > > > > > http://www.nature.com/nbt/journal/v30/n5/full/nbt.2203.html?WT.ec_id =NBT-201205#/methods > > > > > >> > >> What about peripheral blood where one may be measuring a signal from a > >> variety of cell types or tissues? > > > > > > Funny you should mention this particular task :-) > > > > > http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0 041361 > > > > thanks for this reference, I hadn't seen it. Interesting to read that > after this study: > http://ajrccm.atsjournals.org/content/185/4/373.long > with tiny fold-changes that replicate across populations: > http://ajrccm.atsjournals.org/content/185/4/373/F3.large.jpg > > > > Given the difficulty of isolating gold standard reference populations by > > flow sorting, it's tough to benchmark the various transformations, but > what > > you gain in linearity you may lose in leverage. Since there isn't one > > particular transformation that simultaneously linearizes and stabilizes a > > proportion, > > > > > http://www.jstor.org/discover/10.2307/1269291?uid=2129&uid=2&uid=70& uid=4&sid=21101141681241 > > > > you have to pick your battles. In the case of compositional analysis, > 30+ > > years after Aitchison and Shen's seminal papers, it appears to remain > > unresolved. The ability to isolate a small number of highly purified > cells > > and perform targeted BS-seq on picogram quantities of DNA may put this to > > rest. > > > > > http://leg.est.ufpr.br/lib/exe/fetch.php/pessoais:abtmartins:thestat isticalanalysisofcompositionaldata.pdf > > > > However... joint analysis via DNA methylation and expression (array or > > RNAseq) is another matter, and there I have a candidate (in need of > > validation). > > > > I'll keep an eye out for that. > -b > > > > > I can't say that I'm entirely unhappy about you 'hijacking' this > thread... > > > > --t > > > > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Andrew Jaffe ▴ 120

@andrew-jaffe-4820

Last seen 10.7 years ago

I'm jumping in here kind of late, but hopefully can help you out. The first thing, like Brian suggested, is to make sure the inputs are reversed (because the Charm data has unmethylated as the "enriched" sample), as you're using MeDip data. However, since you are actually getting log- ratio differences, I'm going to proceed like its correct. The interpretation is not quite correct, because the methylation is quantitative, not binary ("The region has 0.30427 times the risk of being methylated in samples of after treatment compared to samples of before treatment."). It's more that samples after treatment have a logit difference in methylation = 0.3. However, since differences in the logit scale are not very interpretable, the easiest way to get % methylation differences would be to take the anti-logit of your logit methylation values, and then calculate the means on this scale, post hoc. groups = ifelse(outcome==[whatever], 1, 0) ilogit = function(x) 1/(1+exp(-x)) p = ilogit(methylation_matrix) dm = rowMeans(p[,groups==1]) - rowMeans(p[,groups==0]) you can use the indexStart and indexEnd columns of the dmr list to find each region's difference in mean on the %M scale. Like Kevin suggested, you might want to prioritize some DMRs by this value, as very significant logit differences e.g. -4 vs -3.5 is only a 1% difference in methylation but a 0.5 difference in the logit scale. Hope that help, Andrew [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Andrew Jaffe ▴ 120

Login before adding your answer.