There is probably a super-sophisticated way to do this, and if so either Herve or Michael will be along soon to tell you. For now, here is my dumb approach. Do note that you probably want GRanges for this, rather than whatever you have now.
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> ex <- exons(TxDb.Hsapiens.UCSC.hg19.knownGene, columns = c("EXONID", "TXNAME"))
## You would convert your copy number data to a GRanges, rather than making a fake one like this
> segmentGR <- GRanges(rep("chr1", 5), IRanges(c(11532,12222,12228, 12561, 12705), c(12345, 12934,12593, 13721, 14457)), Segment_mean = runif(5))
> segmentGR
GRanges object with 5 ranges and 1 metadata column:
seqnames ranges strand | Segment_mean
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 11532-12345 * | 0.293932071421295
[2] chr1 12222-12934 * | 0.826928496127948
[3] chr1 12228-12593 * | 0.351696432800964
[4] chr1 12561-13721 * | 0.251908763078973
[5] chr1 12705-14457 * | 0.0250664639752358
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> fo <- findOverlaps(segmentGR, ex)
> fo
Hits object with 16 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 2 1
[3] 2 2
[4] 2 3
[5] 2 4
... ... ...
[12] 5 3
[13] 5 5
[14] 5 6
[15] 5 13964
[16] 5 13965
-------
queryLength: 5 / subjectLength: 289969
> np <- vector("numeric", length(segmentGR))
> np
[1] 0 0 0 0 0
> tab <- table(queryHits(fo))
> np[as.numeric(names(tab))] <- tab
> np
[1] 1 4 0 5 6
> segmentGR$Num_prob <- np
> segmentGR
GRanges object with 5 ranges and 2 metadata columns:
seqnames ranges strand | Segment_mean Num_prob
<Rle> <IRanges> <Rle> | <numeric> <numeric>
[1] chr1 11532-12345 * | 0.293932071421295 1
[2] chr1 12222-12934 * | 0.826928496127948 4
[3] chr1 12228-12593 * | 0.351696432800964 0
[4] chr1 12561-13721 * | 0.251908763078973 5
[5] chr1 12705-14457 * | 0.0250664639752358 6
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Not sure how well
table
will work for a really big vector, so a more performant option might be to use anRle
instead.