I only want to find genes that have start positions located within a defined range window. If there are no overlaps I want to return NAs for the column fields that describe the overlap. However if there is an overlap with a gene start, then I want to return the gene start and end locations as well as its chromosome and strand. Any help would be much appreciated.
The code below should be pretty straightforward to understand. Genes
is a GRanges object with gene attributes (e.g. start, end, strand, chromosome) and x.window
is another GRanges object with window ranges. I believe the code below works but counts a gene as overlapping if it overlaps in any way.
# compute signal window overlap with genes overlaps <- subsetByOverlaps(genes, x.window) ## maybe limit to only TSS overlaps? # subsetByOverlaps(genes, x.window, type = 'start') signal_window=names(x.window) signal_score=x.window$score signal_start=start(x[window]) # original start signal_end=end(x[window]) # original end signal_center=start(x[window]) + floor((width(x[window]) - 1)/2) window_start=start(x.window) window_end=end(x.window) not_found=rep(NA, length(x.window)) values <- data.frame(signal_window=signal_window, signal_score=signal_score, signal_start=signal_start, signal_center=signal_center, signal_end=signal_end, window_start=window_start, window_end=window_end) if(length(overlaps) > 0){ hits <- findOverlaps(x.window, genes, ignore.strand=FALSE) ## want to limit to only TSS overlaps within window! signal_window=names(x.window)[hits@from] signal_score=x.window$score[hits@from] signal_start=start(x[window])[hits@from] # original start signal_end=end(x[window])[hits@from] # original end signal_center=start(x[window])[hits@from] + floor((width(x[window])[hits@from] - 1)/2) window_start=start(x.window)[hits@from] window_end=end(x.window)[hits@from] # window_center=start(x.window)[hits@from] + floor((width(x.window)[hits@from] - 1)/2) # same as signal center gene_id=genes$gene_id[hits@to] gene_symbol=genes$symbol[hits@to] gene_chr=chrom(genes)[hits@to] gene_start=ifelse(strand(genes)[hits@to] == '+', start(genes)[hits@to], end(genes)[hits@to]) gene_end=ifelse(strand(genes)[hits@to] == '+', end(genes)[hits@to], start(genes)[hits@to]) gene_strand=strand(genes)[hits@to] overlaps <- data.frame(signal_window=signal_window, signal_score=signal_score, signal_start=signal_start, signal_center=signal_center, signal_end=signal_end, window_start=window_start, window_end=window_end, gene_id=gene_id, gene_symbol=gene_symbol, gene_chr=gene_chr, gene_start=gene_start, gene_end=gene_end, gene_strand=gene_strand) values <- merge(values, overlaps, by=names(values), all=T) }
It appears this does the same as the above code? How would I restrict the overlap such that the overlap counts only if the TSS is included within the window?