Entering edit mode
Simone
▴
190
@simone-5854
Last seen 6.6 years ago
Hi!
My question is more of a general style, nevertheless I hope someone
can
help.
I am currently trying to analyze "differential variability" of gene
expression and, above all, methylation data (Illumina microarray data:
27K
and 450K BeadChip) in the context of aging, i.e. I would like to see
if the
variability of methylation increases (or decreases) for (healthy)
individuals when they age. I would like to do this gene-wise, to see
if and
which genes show increased/decreased variability.
Several studies already published in this context employ different
methods
for such kind of analyses:
First of all there is the normal F-test. But since it requires data
that
does not depart from normality I think it is not applicable in my
case. For
one of my datasets (~ 500 samples after outlier removal) I performed
Shapiro-Wilk tests for the ~ 27.000 CpGs and found that more than
26.200
CpGs do not have normally distributed values (FDR 0.05). I think this
is an
usual observation when working with methylation data.
In other analyses investigating similar questions Bartlett's test was
employed. But it would require normal distributions as well. I also
read
something about this right here or in the R mailing list, where
Ansari's
test was proposed then for doing such kind of analyses. So maybe
Ansari's
test would be a good choice, although so far I have not seen any
publication doing variability analyses by using Ansari's test.
Another approach which was recommended to me was to not build age
groups
and compare them to each other (I used two "extreme" age groups, so
very
young vs. very old samples), but to create a kind of fixed-effect
models
for analyzing variability with age. Maybe something like this would be
the
best option as we have all the age information available (in years or
even
months) and this way we do not loose any information we actually have
got.
But I am not quite sure about how to model variability. How would one
do
this?
Recently there was also a study published where they say that they
used
linear models and calculated "methylation deviance" as the squared
distance
of the residuals of every marker from the population mean, but again I
am
not sure about it, and the description of the methods part is quite
short.
Any suggestions about the "best" way to analyze changes in variability
of
methylation (and gene expression) values?
Which strategy would you recommend?
Best,
Simone
[[alternative HTML version deleted]]