Most of my samples show a private.germline.all count of 0. However, there are tons of germline mutations that are not filtered out, as the PureCN log files show. Why is the private germline count 0? I checked one sample's germline mutations manually to see if in fact all of them occurred in another sample, and they do not, although I did not do all the filtering that PureCN did. However, the PureCN filtering removed a fairly small number of mutations compared to the overall total.
I think I may see the problem, but I don't understand it. In function callMutationBurden(), there is the following statement:
This has the effect of eliminating most germline mutations from consideration, because they all have very low prior.somatic, looks like 9.9e-05. I would think this is an appropriate value for prior.somatic of mutations not marked SOMATIC?
What effect would this have, if any, on the analysis? Seems like, being in the final stage of computing mutation burden, that it probably has no effect on CNV calculations?
Private germline means germline, but not in public germline databases.
The point of the mutation burden function is to remove known AND private SNPs from variant calls in tumor-only analyses. We hopefully by now removed all artifacts, so after germline filtering, we should end up with only somatic calls. Mutation burden is the somatic mutation rate.
This line you quoted takes care of removing known germline. There can be somatic variants at known germline sites, but this should be rare. The next lines in this function remove predicted germline from the novel mutations (not in dbSNP).
It's completely downstream, see the vignette section about callMutationBurden.
If you have matched normals, then calculating mutation burden is trivial, you simply count the somatic calls and normalize by callable region. PureCN will do that for you, but this function is written for tumor-only where this isn't as easy because you don't know if a call is germline or somatic in advance.
In matched tumor/normal, if you get a non-zero number in private germline, it means annotated as SOMATIC, but fits germline much better. This should be rare and is probably an artifact (or the coverage in normal was poor).
Yes, this is downstream of everything.
>> In matched tumor/normal, if you get a non-zero number in private germline, it means annotated as SOMATIC, but fits germline much better. This should be rare and is probably an artifact (or the coverage in normal was poor).
Ahh, okay, that's the crucial thing I wanted to know. You might want to add that as a comment for the private.germline.all column description.
Are you saying that what you are doing is checking to see if any of the variants which PureCN judges to be SOMATIC are not marked as such in the VCF? In which case, you would HOPE to see 0's, and non-0's indicate a possible problem? If that's the situation, I think the term "private germline" threw me off. In this MSEQ project of mine, somatic mutations are either clonal (all tumors have it), subclonal (>1 tumors have it but not all), or private (only one tumor has it). So, I naturally thought that a private GERMLINE mutation is one that is present in only one normal sample out of all the normals in the project.