Hi,
I am trying to use nullranges to generate 200 random genomic ranges that can span anywhere in the human genome (except for blacklist regions) that are each 200kbp long each (for further downstream analyses). Based on a tutorial I followed, I generated some genomicranges (boots variable) and then expanded to 200kbp using bedtools. I followed the tutorial for bootranges which used DNase Hypersensitive Site data but am not sure that it is necessary/ helpful for my analysis.
# Load DNase Hypersensitive Site data or example data. This is genome wide.
dhs <- DHSA549Hg38()
dhs <- dhs %>% plyranges::filter(signalValue > 100) %>% # Filter so dnase signal > 100
mutate(id = seq_along(.)) %>%
plyranges::select(id, signalValue)
length(dhs)
# Retrieve experimental data from ExperimentHub
suppressPackageStartupMessages(library(ExperimentHub))
eh = ExperimentHub()
exclude <- eh[["EH7306"]] # load regions of genome to exclude ie. ENCODE blacklist genes
seg_cbs <- eh[["EH7307"]] # segments based on DNase sites and gene density
#plotSegment(seg_cbs,exclude,type = "ranges") #if you want to visualize segments
# Perform bootstrappng on 'dhs' data
set.seed(5)
R <- 50 # 50 iterations
blockLength <- 2e5 # max length that a block can be (I want them all to be 200kbp regions though, some/most are less)
boots <- bootRanges(dhs, blockLength, R = R, seg = seg_cbs, exclude = exclude, type = permute) # excute bootstrapping
# Sample 200 granges from 'boots'
sampled_granges <- sample(boots, 200, replace = TRUE)
# Then used bedtools to expand to make each range 200kbp long.
I am using the ranges to simulate random permutations throughout the genome. Is this the best way to use this package for my analysis?
Thanks in advance.
For my preliminary analysis I am really focused on placing the ranges uniformly. Thanks for the info and help and I will give your suggestions a go!