How to use cn.mops to detect CNVs from Targeted DNA Sequencing data?
1
0
Entering edit mode
Bouzid.a ▴ 20
@bouzida-10697
Last seen 5.0 years ago

Hi,

I'm using cn.mops to detect CNVs from Targeted DNA Sequencing data by "TruSight One Sequencing Panel from Illumina" investigated for one patient and one control samples, on an Illumina MiSeq sequencer with 150 bp paired-end run.
By running the algorithm with the referencecn.mops function, it results #GRanges object with 67 ranges.
Then, by verifying, the resulting CNVs with the coverage of the enriched regions in the raw data,
I found that several detected CNVs aren't included in the targeted regions enriched by the panel!!
Please, How can you explain that?
Find below my code, what am I doing wrong?

Thanks,
Amal

#######
BAMFiles <- list.files(pattern=".bam$")

refSeqName=c(paste0("chr",1:22),"chrX","chrY")
bamDataRanges<-getReadCountsFromBAM(BAMFiles,sampleNames=paste("Sample",2:3), refSeqName= refSeqName,mode="paired",WL=5000)

case.ctrl2<-referencecn.mops(cases=bamDataRanges[,3],controls=bamDataRanges[,2])  case.ctrl2.ICN <- calcIntegerCopyNumbers(case.ctrl2)
(cnvs(case.ctrl2.ICN ))

#####

 

 

cn.mops cnv targeted sequencing • 2.6k views
ADD COMMENT
1
Entering edit mode
@gunter-klambauer-5426
Last seen 3.8 years ago
Austria

Hello Amal,

The reason for this behaviour of cn.mops is that you used the "whole genome sequencing" script. The function "getReadCountsFromBAM" partitions the genome into equally sized windows and counts how many reads are in these windows. However, you have targeted sequencing data and you should therefore restrict the analysis to the targeted segments. For this case, you can use the function "getSegmentReadCountsFromBAM" and "exomecn.mops". In the following code, I assume that you have a list of targeted regions provided as BED file (tab-separated):

 

library(cn.mops)

BAMFiles <- list.files(pattern=".bam$")

segments <- read.table("targetRegions.bed",sep="\t",as.is=TRUE)

gr <- GRanges(segments[,1],IRanges(segments[,2],segments[,3]))

X <- getSegmentReadCountsFromBAM(BAMFiles,GR=gr,mode="unpaired")

resCNMOPS <- exomecn.mops(X)

resCNMOPS <- calcIntegerCopyNumbers(resCNMOPS)

 

I hope this helps!

Regards,

Günter

ADD COMMENT
0
Entering edit mode

Thank you Günter for your reply, but I do not have a list of targeted regions provided as BED file, how can I retrieve it??

Thanks,

Best Regards

 

ADD REPLY
1
Entering edit mode

Hello Amal,

There is a workaround: Go back to the initial "getReadCountsFromBAM" and use a small segment size, e.g. "WL=50". Then you get a read count matrix with segments/windows along the whole genome. Remove rows with zero read counts across all samples (these are the DNA segments that were not targeted). Then run "exomecn.mops" on the the read count matrix with the remaining rows.

 

Regards,

Günter

ADD REPLY
0
Entering edit mode

Hi Günter, 

Thanks for your suggestion!

I tried but I think that I make a mistake somewhere?!

BAMFiles <- list.files(pattern=".bam$")

refSeqName=c(paste0("chr",1:22),"chrX","chrY")
DataRanges.smallWL <- getReadCountsFromBAM(BAMFiles, sampleNames=paste("Sample",1:3), refSeqName= refSeqName,mode="paired",WL=50)

#results: GRanges object with 61913560 ranges and 3 metadata columns:

matrix <- DataRanges.smallWL[DataRanges.smallWL[,1:3] != 0, ]

#results: GRanges object with 0 ranges and 3 metadata columns!

 

 

 
ADD REPLY
1
Entering edit mode

I think your code for filtering the DNA segments does not work. I give you code how to do this on the data that comes with cn.MOPS:

library(cn.mops)

data(cn.mops)

idx <- apply(values(XRanges),1, function(x) !all(x < 70))

XRangesSmall <- XRanges[idx, ]

The code above removes DNA segments where all samples have less than 70 reads. You should use a smaller number, e.g. 5 or 10, instead of 70.

Regards,

Günter

ADD REPLY
0
Entering edit mode

Dear  Günter

Thanks a lot for all your suggestions that helped me cope! Just to remember, I'm working with targeted sequencing on two samples, one case and one control. I tried with your two previous proposals: 1/The first one:To generate the read counts from BAM Files comparing to whole genome use a small segment size

>BAM2Files <- list.files(pattern=".bam$")
>refSeqName=c(paste0("chr",1:22),"chrX","chrY")
>DataRanges.smallWL.case_ctrl<-getReadCountsFromBAM(BAM2Files, sampleNames=paste("Sample",1:2), refSeqName= refSeqName,mode="paired",WL=50)
>idx.case_ctrl <- apply(values(DataRanges.smallWL.case_ctrl),1, function(x) !all(x < 5))
> table(idx.case_ctrl)
#idx.case_ctrl
#FALSE     TRUE
#61436686   476874
>DataRanges.smallWL.case_ctrl.idx <- DataRanges.smallWL.case_ctrl[idx.case_ctrl, ]
>exom.DataRanges.smallWL.case_ctrl.idx<-exomecn.mops(DataRanges.smallWL.case_ctrl.idx)
>exomecn.CN.smallWL.case_ctrl.idx<calcIntegerCopyNumbers(exom.DataRanges.smallWL.case_ctrl.idx)
>cnvs_exomecn.CN.smallWL.case_ctrl.idx<-(cnvs(exomecn.CN.smallWL.case_ctrl.idx))
>cnvs_exomecn.CN.smallWL.case_ctrl.idx
#GRanges object with 176 ranges and 4 metadata columns

## I tried also with referencecn.mops:

>ref.DataRanges.smallWL.case_ctrl.idx<referencecn.mops(cases=DataRanges.smallWL.case_ctrl.idx[,2],controls= DataRanges.smallWL.case_ctrl.idx[,1])
>CN.ref.DataRanges.smallWL.case_ctrl.idx<calcIntegerCopyNumbers(ref.DataRanges.smallWL.case_ctrl.idx)
>cnvs_CN.ref.DataRanges.smallWL.case_ctrl.idx<(cnvs(CN.ref.DataRanges.smallWL.case_ctrl.idx))
>cnvs_CN.ref.DataRanges.smallWL.case_ctrl.idx
#GRanges object with 45 ranges and 4 metadata columns:


# =>Of a first view, overlapping the two resulted GRanges objects only 5 CNVrs have the CNVs (the same starts)!

2/The second one: To generate the read counts from BAM Files basing on the list of targeted regions in the panel provided as a BED file.

>BAM2Files <- list.files(pattern=".bam$")
>targeted.segments <- read.table("TruSight_One_v1.1.bed",sep="\t",as.is=TRUE)
> dim(targeted.segments) # [1] 62309     4
>gr<-GRanges(targeted.segments[,1],IRanges(targeted.segments[,2],targeted.segments[,3]))
>getSegmentReadCounts<-getSegmentReadCountsFromBAM(BAM2Files,GR=gr,mode="paired")
> exomecn.getSegmentReadCounts <- exomecn.mops(getSegmentReadCounts)
> exomecn.getSegmentReadCounts
#CNV regions:
# GRanges object with 5 ranges and 2 metadata columns:

#I tried also with referencecn.mops insted of exomecn.mops

>ref.getSegmentReadCounts<-referencecn.mops(cases=getSegmentReadCounts[,2],controls= getSegmentReadCounts[,1])
> ref.getSegmentReadCounts
#CNV regions:
#GRanges object with 23 ranges and 1 metadata column:

#=> When overlapping the two resulted GRanges objects (with SegmentReadCounts) only 1 CNV was deduced with the same region! And it’s also the only CNVr between the four previous resulted CNV regions GRanges objects.

I expected more overlapping CNVs?! Really, I lost between the different methods, on which one I have to trust?  And how to choose the best approach for my analysis?

Looking for youy reply,                                                                                                 Many thanks, most grateful!

Best, Amal

ADD REPLY
0
Entering edit mode

Please,  there is someone who can answer me for my previous comment??!

ADD REPLY
1
Entering edit mode

Hello Amal,

Any results that do not come from "referencecn.mops" do not make sense, since you only have two samples. If there is a CNV in one of the two samples, it is randomly assigned to one of the two... The assumption of cn.mops (including exomecn.mops) is that the majority of samples has copy number 2, but here -- if there is a CNV -- the "majority" can only be 50%. I would not recommend to use cn.mops below 8 samples.

To sum it up: you should trust only the results of "referencecn.mops".

 

Regards,

Günter

ADD REPLY
0
Entering edit mode

Dear Günter, 

Many thanks for your reply. 

I will take that into account!

Best,

Amal

ADD REPLY

Login before adding your answer.

Traffic: 644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6