We decided to run some of our multi-regional sequencing (MSEQ) samples through a SNP array and then the Affymetrix Axiom Analysis Suite and ASCAT software to obtain CNV data that we could compare to PureCN CNV data, in order to assess the accuracy of PureCN. Unfortunately, we are seeing a poor concordance between the two. I suspect the reason may be incorrect purity and ploidy values, since determining whether a copy ratio represents a gain or loss depends crucially on having accurate purity and ploidy estimates. If they are inaccurate, a fairly small difference in copy ratio between ASCAT and PureCN might lead to one copy ratio looking like a loss and the other a gain.
I'm fairly certain that ASCAT's estimated purities are wrong because several of them are 1.0. To test this, I adjusted ASCAT's raw copy ratios using PureCN's purity/ploidy estimates, then did the comparison. Concordance was still poor, though better. Then, knowing that several of our samples had PureCN purity estimates below 0.2, I removed samples with purity under 0.4 and did the comparison. This resulted in slightly POORER concordance.
A result of our MSEQ project as well as that of others is that about 80% to 95% of all somatic mutations are subclonal. My understanding is that this confounds the ability to estimate purity and ploidy. Now, here is my question. With our multiple samples per tumor, we know which mutations are clonal (found throughout the tumor). Would it be possible (theoretically) for PureCN to make more accurate purity/ploidy estimates if non-clonal mutations were excluded, or if they were at least identified as such to PureCN?
Hard to diagnose from the distance. Feel free to send screenshots (by email if you prefer) of the B-allele frequency plots (like Fig 4 in the main vignette) of some cases with dramatic difference between ASCAT and PureCN.
Sub-clonal somatic mutations should not matter much. Many sub-clonal copy number alterations can make ploidy inference tricky.
We found that copy number changes were just as heterogeneous as SNV changes. Typically, 5 to 10% of copy number changes are found throughout the tumor, the rest are subclonal.