Question

Advise on analyzing NGS of many genes or intergenic peaks in many conditions

0

Entering edit mode

Xiaohui Wu ▴ 280

@xiaohui-wu-4141

Last seen 10.6 years ago

Hi all, I have NGS data (each tag is 20nt from 30 libraries, total about 60 million) in different conditions and have filtered some genes and intergenic regions (both called peak here, total about 20,000 peaks, rice). For now, I came up some ideas as follows: 1) the correlation of expression in these peaks (here expression is the normalized tag count) between each pair of libraries 2) cluster peaks or libraries based on their peak expression, like heatmap function in R 3) the fluctuation (or deviation) of each peak in these 30 libraries, to find what peaks are with consistent expression and what peaks are with fluctuated expression ** Is there any effective way to calculate something like this? Is the standard deviation sd or coefficient of dispersion (sd/avg) enough? 4) DE peak between each pair of libraries or between each pair of clusters of libs. Then use GO to compare the function of different sets of DE peaks. ** Here, I tend to use clusters of libs to reduce the times of comparison, but do you think I can treat the libs in the same cluster as different repliates then use DE package like EdgeR or DESeq to find DE peak? 5) relative peak usage among these libraries. ** but I've no idea how to calculate this. I think just using (expression of the peak in one library) / (total expression of that peak in all libraries) is not suitable for this case, because there may some peaks expressed much lower than other peaks, while this won't be reflected in the formular. Any idea is appreciated. Thank you! Regards, Xiaohui [[alternative HTML version deleted]]

GO edgeR DESeq GO edgeR DESeq • 994 views

ADD COMMENT • link 14.6 years ago Xiaohui Wu ▴ 280