Hi, Very new to R here. I have been trying to plot the relationship between AT content and proteins binding to a particualr stretch of DNA, as in a heat map or the figures seen in this paper: https://www.ncbi.nlm.nih.gov/pubmed/29267285. However, I have been having a lot of confusion with how exactly to go about doing this. I understand how to tile the genome into windows with tileGenome, this outputs a GRanges List, not an object. I also understand how to calcualte the GC content of specified ranges, such as "gcContent(Hsapiens[["chr1"]])". I don't understand how to go about merging these two approaches to calculate the AT content (1-GC content) of each interval in the GRanges List, as if I had a single GRanges object I could use something like:
> windowViews <- Views(BSgenome.Mmusculus.UCSC.mm10, windowRanges)
gcFrequency <- letterFrequency(windowViews, letters="GC", as.prob=TRUE)
or
letterFrequency2 <- function(x, letters, OR="|",
as.prob=FALSE, ...) {
stopifnot(is(x, "BSgenomeViews"))
chunksize <- 500000L
chunks <- breakInChunks(length(x), chunksize)
chunks <- as(chunks, "IRanges")
ans_chunks <- lapply(seq_along(chunks),
function(i) {
x_chunk <- extractROWS(x, chunks[i])
letterFrequency(x_chunk, letters, OR=OR,
as.prob=as.prob)
})
do.call(rbind, ans_chunks)
}
But I clearly am confused on what arguments to pass to View, how to store the AT content calculation for each range, and how to plot this (at all, or against the binding preferences of a specific protein). In short, I am not sure how to calculate and store the GC/AT contents for windows of arbitrary size across the genome. Any advice you can provide on how to do this would be great, sorry if this is extremely trivial!
Also, the above code was take from H. Page in a previous question, full credit.