Hi,
I'm attempting to estimate CNVs from whole exomeseq, using BAM files from TCGA. I wish to make these estimates without making use of a paired normal. Is it possible to make use of cn.MOPS to do this or is the only way to predict CNVs in case of cancers using cn.MOPS is to make use of a reference.
I would also be grateful for any suggestions on other tools I can use to make CNV estimations without using a paired normal.
Thanking You,
Vakul
Hello Günter,
Thank you for the reply. As you suggested I have obtained control normals and am trying to use referencecn.mops.
However when I try to normalize the count data I build using getSegmentReadCountsFromBAM, I get the following error.
Error in normalizeChromosomes(X, chr = chr, normType = normType, qu = normQu) :
Some normalization factors are zero! Remove samples or chromosomes for which the average read count is zero, e.g. chromosome Y.
I have restricted the Range the count is carried out on to exclude the Y chromosome, Also the count matrix shows that all samples have reads in the regions specified in the GRanges object, so I don't quiet understand why the normalization factor is zero. I would be grateful for advice on how I can tackle this issue.
Thanking You,
Vakul
Hello Vakul,
Thanks for bringing this up. For this particular normalization type the median read count per sample is calculated - this value seems to be zero for at least one sample. This means that at least 50% of the segments have zero read counts. You could check which segments have extremely low coverage for all samples and remove these segments. Alternatively, you can use "normType="mean"" although I would advise to check the data quality first.
Regards,
Günter
Hello Günter,
I'll run a few QC measures on the sequencing data as you suggested, before changing the normalization parameters. I however had an observation that I would like your opinion on. I was having the normalization issue when I used exon definitions I acquired from ENSEMBL Biomart. However I could make the same normalization when I created the count matrix using exon definitions from the TCGA (UNC GTF file). Is there a reason this would happen? and should i prefer using one over the other?
Thanking You,
Vakul