You previously indicated that the C values in the PureCN output have been adjusted for purity. Does this also hold true for the "mean" values for mean log ratios? It appears to me that they are adjusted for purity (but are still log mean ratios, not actual mean ratios and not twice the ratio to give a copy number value).
Thanks, that was a helpful reference. However, I see that it has a mistake in its equation for R'(x). I believe the correct equation should be:
T = tau, a = alpha, q(x) = integer CN in cancer cells, R(x) = observed CN ratio, R'(x) = CN ratio in tumor cells
His derivation:
where:
However, in the last step where he substituted q(x) in q(x)/T, he did the algebra wrong. The correct algebra is:
As a test, say that purity = a = 0.5, tumor ploidy = T = 2, and raw coverage ratio is 1.5. Then we expect the adjusted coverage ratio to be 2 (tumor segment is 2X amplify (4 copies) and this becomes raw ratio of 1.5 when purity is 1/2: [0.5*4 + 0.5*2] / 2 = 1.5).
His: R'(x) = 1.5/0.5 - 2(0.5)/(0.5 * 2) = 3 - 2(0.5) = 2 (correct) Mine: R'(x) = [0.5*2*1.5 + 2(0.5)1.5 - 2(0.5)] / (0.5*2) = 1.5 + 1.5 - 1 = 2 (correct)
But now suppose that tumor ploidy = T = 4, and we still have purity=a=0.5. Say raw coverage ratio = 1.0, which means there is no tumor amplification, the number of copies at any locus is the same as the mean number of copies, in both the 2X normal and 4X tumor tissue. Then we expect the adjusted coverage ratio to also be 1.
Not sure, I looked into this more than 2 years ago. I used the following and believe it's correct:
rds <- readRDS("Sampleid.rds")
r <- rds$results[[1]]
r$seg$seg.mean.adjusted <- r$seg$seg.mean/r$purity - 2*(1-r$purity)/(r$purity*r$ploidy)
I haven't used it much though because I found little benefit in GISTIC and for everything else you usually want the absolute copy numbers.
Your equation above matches the one in the paper you cited, which is incorrect. Your seg.mean is his R(x), your purity is his a, your ploidy is his T.
I found that PureCN:::.calcExpectedRatio() is doing it correctly (it is doing the inverse operation, computing R(x) from R'(x)).
However, in runAbsoluteCN(), I find this line:
opt.C <- (2^(seg$seg.mean + log.ratio.offset) * total.ploidy)/p - ((2 * (1 - p))/p)
and since C = ratio * ploidy, the above equation is the paper's (incorrect) R'(x) * ploidy. It seems to be wrong. Please check it. Maybe I'm missing something, but to me it looks like a definite algebra mistake.
I think I'll go ahead and open an issue on the PureCN github project for this.