Hi all,
I am now using cn.mops to call CNV in a plant population including 271 samples.Now the question is the sequencing depth variant from 0.00496X to 40.11X, the average depth is about 7X, most of them are 3X~10X. So do I need cluster the samples by depth that get several groups and then calculate by groups or punch files together for calling? Thank you.
Thank you Günter. Do I need set window length by my self or set by software automatically, which is better? Do you have any refereces to set window length?
Regards,
Xiao
Hello Xiao,
The program determines the window length automatically based upon the sample with the lowest number of reads (lowest coverage). However, I advise to do some calculations and set this parameter by hand such that on average about 50-100 reads map to each window (segment).
The average number of reads per window/segment is:
averageReadCount=coverage*windowLength/readLength.
Assuming you have want to have on average 50 reads in a segment/window, you havewindowLength = readLength * 50 /coverage.
For your low-coverage samples with coverage of 0.005, you should use a window length of 50*100/0.005=1e6bp (assuming a read length of 100). The smallest CNVs you will be able to detect is three times (determined by cn.mops's parameter "minWidth=3") this length, meaning 3e6bp. You will be able to detect only very large CNVs.For the a medium coverage of 5X, this formula suggests a window length of 1000bp and the smallest detected CNVs will be 3000bp (with "minWidth=3").
Regards,
Günter
Thank you Günter, this helps me a lot!
The other question is the result of data frame of "segmentation" function. The data frame contained several columns named "seqname", "start", "end", "width", "strand", "sample", "median", "mean" and "CN". Are the "median" and "mean" here both refer to the I/NI calls? How to filter this data frame to get more confident CNVs? I have read other's Q&A, you said "The farer the value is away from 0, the more likely there is a CNV", do you have a standard for this?
Regards,
Xiao