Entering edit mode
Steve Shen
▴
330
@steve-shen-3743
Last seen 10.3 years ago
Dear All,
I really appreciate if someone could help me out with this basic
statistical
problem or provide some suggestions. I have a set of bisulphite
methylation
data (methylC-seq) at single base resolution. The mapping information
for
each base is including coverage (2-20X) and frequency of methylation
if the
base is a C. The summary for a sliding window say 500bps will be the
percentage of methylated C observed over base coverage for each
position.
For example,
sample A:
Index, start, end, strand, methylC_observed, positions, coverage,
C_type
window1, 12297500, 12298000, +, 1/3/0/5/1/2,
12297573/12297779/12297631/12287774/12297854/12297958, 6/5/4/10/7/15,
C/CG/CG/CHG/C/C
window2,
.
.
sample B:
Index, start, end, strand, methylC_observed, coverage, positions, type
window1, 12297500, 12298000, +, 3/0/3/0/1/0,
12297573/12297779/12297631/12287774/12297854/12297958, 12/9/11/10/3/5,
C/CG/CG/CHG/C/C
window2,
I understand that each base should be treated differently, such as
type of C
or CG or CHG and so on. Regardless the C type for now, however, the
real
problems for me are 1) how to summarize the methylated C, 2) how to do
normalization, 3) more importantly how to make a comparison between
sample A
and B on window by window bases, what statistics can be applied? Any
help
and suggestions are very appreciated.Thanks in advance,
Steve
[[alternative HTML version deleted]]