Entering edit mode
Xiaohui Wu
▴
280
@xiaohui-wu-4141
Last seen 10.3 years ago
Hi all,
I have NGS data (each tag is 20nt from 30 libraries, total about 60
million) in different conditions and have filtered some genes and
intergenic regions (both called peak here, total about 20,000 peaks,
rice).
For now, I came up some ideas as follows:
1) the correlation of expression in these peaks (here expression is
the normalized tag count) between each pair of libraries
2) cluster peaks or libraries based on their peak expression, like
heatmap function in R
3) the fluctuation (or deviation) of each peak in these 30 libraries,
to find what peaks are with consistent expression and what peaks are
with fluctuated expression
** Is there any effective way to calculate something like this?
Is the standard deviation sd or coefficient of dispersion (sd/avg)
enough?
4) DE peak between each pair of libraries or between each pair of
clusters of libs. Then use GO to compare the function of different
sets of DE peaks.
** Here, I tend to use clusters of libs to reduce the times of
comparison, but do you think I can treat the libs in the same cluster
as different repliates then use DE package like EdgeR or DESeq to find
DE peak?
5) relative peak usage among these libraries.
** but I've no idea how to calculate this. I think just using
(expression of the peak in one library) / (total expression of that
peak in all libraries) is not suitable for this case, because there
may some peaks expressed much lower than other peaks, while this won't
be reflected in the formular.
Any idea is appreciated.
Thank you!
Regards,
Xiaohui
[[alternative HTML version deleted]]