Compute the CpG content for all chromosome like the percentage of G+C
2
0
Entering edit mode
@tiphaine-martin-6416
Last seen 6.3 years ago
France

Hi,

I would like to visualize not the GC content but CpG content in a track of Gviz, for example.

Do you have an idea to do that ?

Regards,

Tiphaine

CpG • 2.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 12 hours ago
United States
It depends on what exactly you mean by 'CpG content'. If you mean that you want to plot the CpG islands, then please note that the very first example in the Gviz vignette shows just how to do that. Or do you mean something else?
ADD COMMENT
0
Entering edit mode
@tiphaine-martin-6416
Last seen 6.3 years ago
France

I would like not only the CpG island but  the distribution of CpG along the chromosome or in a genomic region.

So it is like the percentage of CpG in window and this window slips a step along the chromosome (1 or more nucleotide). 

ADD COMMENT
0
Entering edit mode

I'm not sure if this is exactly what you want, but this function takes a BSgenome instance, a chromosome, and a tile width, and calculated CpG % in windows across a particular chromosome

CpG <-
    function(bsgenome, chr, tilewidth)
{
    dna <- bsgenome[[chr]]

    ## CpG on the plus and minus strand (?)
    islands <- matchPDict(DNAStringSet(c("GC", "CG")), dna)
    cvg <- coverage(islands)    # CpG island coverage

    tiles <- tileGenome(seqlengths(bsgenome)[chr], tilewidth=tilewidth,
                        cut.last.tile.in.chrom=TRUE)

    ## Average coverage in each tile
    ## Divide by 2 so each CpG counts only once
    v <- Views(cvg, ranges(tiles))
    tiles$CpG <- viewSums(v) / width(v) / 2
    tiles
}

This would seem to be a relatively effective way to quickly visualize CpG content, e.g.,

library(BSgenome.Hsapiens.UCSC.hg19)
gr <- CpG(BSgenome.Hsapiens.UCSC.hg19, "chr17", 10000)
plot(start(gr) + width(gr) / 2, gr$CpG, pch=".")

Another formulation might slide rather than tile the window across coverage, along the lines of

slidewidth = 10000
diff(cumsum(cvg), lag=slidewidth) / slidewidth / 2

I'm not sure how to visualize this with Gviz; for smaller regions one might use getSeq() to get the DNA sequence of the specific region.

ADD REPLY

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6