I'd like to take advantange of the segmentation algorightms in
cn.mops, to estimate copy number from low-coverage BAM files. The
average coverage in our tumor and normal DNA, is about 0.8 and 0.7
respectively, quite uniformly across the whole genome. The
consistency of the reads, and their ratios, allow us to plausibly
identify amplifications and deletions, either visually, or by simple
calculations. Now it's time for more rigor, for which I turned to
cn.mops.
However, it appears that such low-coverage sequencing is apparently
not well-suited to cn.mops.
The man page says:
minReadCount: If all samples are below this value the algorithm
will, return the prior knowledge.
This prevents that the algorithm, from being applied to segments
with very low coverage. Default=1.
Even if I set minReadCount to 0.2, no useful results are returned:
Individual CNVs:
GRanges with 2 ranges and 4 metadata columns:
seqnames ranges strand | sampleName
median mean CN
<rle> <iranges> <rle> | <factor>
<numeric> <numeric> <logical>
[1] chr10 [ 42441001, 42553000] * | F7272LUNG.sorted.bam
0.5849625 0.5767351 <na>
[2] chr10 [135478001, 135534746] * | F7272SCLC.sorted.bam
0.2196978 0.5403949 <na>
Before I give up on using this impressive package, may I ask if there
is indeed a way to use it with low-coverage but otherwise high-quality
bam files?
Thanks!
- Paul
Hello Paul,
Thanks for your interest in cn.mops. It should be possible to identify
CNVs quite reliably at this coverage, if you use the right parameters.
The parameter "WL" in the function "getReadCountsFromBAM" is crucial
for
the analysis, because it determines the length of the segments, in
which
the reads are counted. If the coverage is low, the segments should be
longer. Which "WL" (window length) did you use?
The second important parameter that controls sensitivity and
specificity
is "priorImpact". I suppose you expect more CNVs in your case,
therefore
you should lower the parameter. The parameter "minReadCount" is not
crucial for the algorithm, and should be left to default.
Did you use the function "cn.mops" or "referencecn.mops"?? Do you have
two samples (tumor and normal) or do you have more tumor samples and
more normal samples. In the first case you should use
"referencecn.mops"
and in the second case "cn.mops".
Please do not hesitate to ask, if you encounter any problems.
Regards,
G?nter
On 04/30/2013 06:26 PM, Paul Shannon wrote:
> I'd like to take advantange of the segmentation algorightms in
cn.mops, to estimate copy number from low-coverage BAM files. The
average coverage in our tumor and normal DNA, is about 0.8 and 0.7
respectively, quite uniformly across the whole genome. The
consistency of the reads, and their ratios, allow us to plausibly
identify amplifications and deletions, either visually, or by simple
calculations. Now it's time for more rigor, for which I turned to
cn.mops.
>
> However, it appears that such low-coverage sequencing is apparently
not well-suited to cn.mops.
>
> The man page says:
>
> minReadCount: If all samples are below this value the algorithm
will, return the prior knowledge.
> This prevents that the algorithm, from being applied to segments
with very low coverage. Default=1.
>
> Even if I set minReadCount to 0.2, no useful results are returned:
>
> Individual CNVs:
> GRanges with 2 ranges and 4 metadata columns:
> seqnames ranges strand | sampleName
median mean CN
> <rle> <iranges> <rle> | <factor>
<numeric> <numeric> <logical>
> [1] chr10 [ 42441001, 42553000] * | F7272LUNG.sorted.bam
0.5849625 0.5767351 <na>
> [2] chr10 [135478001, 135534746] * | F7272SCLC.sorted.bam
0.2196978 0.5403949 <na>
>
> Before I give up on using this impressive package, may I ask if
there is indeed a way to use it with low-coverage but otherwise high-
quality bam files?
>
> Thanks!
>
> - Paul
>
Hi Gunter,
Thanks for this helpful reply. I am glad to learn that cn.mops will
work with low coverage bam files -- very nice.
It looks like I
1) changed a parameter I should have left at default
("minReadCount")
2) failed to change parameters which should be changed ("WL" and
"priorImpact")
BAMFiles <- list.files(pattern=".bam$")
bamDataRanges <- getReadCountsFromBAM(BAMFiles, mode="unpaired")
res <- cn.mops(bamDataRanges, minReadCount=0.2)
Off-list you kindly offered to examine my bamDataRanges so that you
could suggest the correct parameters. I will send you that Granges
object, serialized (also off-list). After we figure this out, I will
gladly summarize all this back to the list.
Many thanks,
- Paul
On Apr 30, 2013, at 1:20 PM, G?nter Klambauer wrote:
> Hello Paul,
>
>
> Thanks for your interest in cn.mops. It should be possible to
identify CNVs quite reliably at this coverage, if you use the right
parameters.
> The parameter "WL" in the function "getReadCountsFromBAM" is crucial
for the analysis, because it determines the length of the segments, in
which the reads are counted. If the coverage is low, the segments
should be longer. Which "WL" (window length) did you use?
> The second important parameter that controls sensitivity and
specificity is "priorImpact". I suppose you expect more CNVs in your
case, therefore you should lower the parameter. The parameter
"minReadCount" is not crucial for the algorithm, and should be left to
default.
>
> Did you use the function "cn.mops" or "referencecn.mops"?? Do you
have two samples (tumor and normal) or do you have more tumor samples
and more normal samples. In the first case you should use
"referencecn.mops" and in the second case "cn.mops".
>
> Please do not hesitate to ask, if you encounter any problems.
>
> Regards,
> G?nter
>
>
>
>
> On 04/30/2013 06:26 PM, Paul Shannon wrote:
>> I'd like to take advantange of the segmentation algorightms in
cn.mops, to estimate copy number from low-coverage BAM files. The
average coverage in our tumor and normal DNA, is about 0.8 and 0.7
respectively, quite uniformly across the whole genome. The
consistency of the reads, and their ratios, allow us to plausibly
identify amplifications and deletions, either visually, or by simple
calculations. Now it's time for more rigor, for which I turned to
cn.mops.
>>
>> However, it appears that such low-coverage sequencing is apparently
not well-suited to cn.mops.
>>
>> The man page says:
>>
>> minReadCount: If all samples are below this value the algorithm
will, return the prior knowledge.
>> This prevents that the algorithm, from being applied to segments
with very low coverage. Default=1.
>>
>> Even if I set minReadCount to 0.2, no useful results are returned:
>>
>> Individual CNVs:
>> GRanges with 2 ranges and 4 metadata columns:
>> seqnames ranges strand | sampleName
median mean CN
>> <rle> <iranges> <rle> | <factor>
<numeric> <numeric> <logical>
>> [1] chr10 [ 42441001, 42553000] * | F7272LUNG.sorted.bam
0.5849625 0.5767351 <na>
>> [2] chr10 [135478001, 135534746] * | F7272SCLC.sorted.bam
0.2196978 0.5403949 <na>
>>
>> Before I give up on using this impressive package, may I ask if
there is indeed a way to use it with low-coverage but otherwise high-
quality bam files?
>>
>> Thanks!
>>
>> - Paul
>>
>
> <klambauer.vcf>