Hi,
I have a list of bigwig files from different data sets. We would like to create a table of all the genomic positions per chromosome to be able to compare the samples with each other.
As the peaks found in each samples are at different chromosomal positions, I would like to complete each chromosomes and add all the positions which have no reads with a 0. As an example
this is my file:
variableStep chrom=chrV 86 1.00 87 1.00 88 1.00 89 1.00 90 1.00 .... variableStep chrom=chrII 1 2.00 2 2.00 3 2.00 4 3.00 5 4.00
In this sample I would like for Chr V to add positions 1-85 with a 0.
And than split the bigwig file into separate chromosomes and creates a data.frame for each ofthe chromosomes.
I know it is possible to read bigwig files into R using rtracklayer and also to subset them per chromosome. What I don't know is if it is possible to add the empty positions into the GRanges object created from the bigwig file.
thanks
Assa
P.S.
I was thinking about it a bit longer now and was wondering if it is possible to create a GRangeList object from all samples and than modify it within this object so that I can modify all samples at once and than reduce the list into a data.frame for the separate chromosomes
thanks again
Thanks Michael for the neat solution. it is still not what i need though. The coverage function summarize the similar values into one row like that:
Is there a way to expand the GRanges object to contain all the positions separately. I need something like this one here (done manually):
thanks in advance
You want to use a GPos object. I thought this would be simple, but it turns out to require a couple lines:
thanks for the suggestion, but it is still not exactly what i need. The code above doesn't take the ends of each chromosome into account. This leads to different number of entries in the two GRanges objects. e.g.
or when changed into GPos objects
Even though the seqlengths() of the two Granges objects is the same for this chromosome, both of them contain less elements as the sequence length
Is there a method to expand (extend) every chromosome in each of the
GRanges
objects based on theseqlengths()
of this chromosome? ( What I would like to have for this example are two GPos objects for chromosome I with230218
positions instead of230207
and230210
positions).I need it so that I can compare the two bigwig files with each other.
Hi Assa,
Make sure you start with a GRanges object that has the seqlengths information on it:
Insert the "no score regions":
Turn into a GPos object:
Then:
Hope this helps.
H.
I made a small edit to my answer above to use an Rle when propagating the score to the GPos object.
Thanks, this looks very nice. I will try it. I have a question though. I am familiar with the seqlengths when creating a GRanges object. But here I am reading a bedgraph file into R. How do I give the
import.bw
orimport.bedgraph
theseqlength()
parameter?I don't know about getting the seqlengths info thru
import.bw()
orimport.bedgraph()
, Michael might be able to help with this. However please note that you can always use theseqlengths()
setter to add the seqlengths "manually" to the GRanges object returned byimport.bw()
orimport.bedgraph()
. For example, since you seem to be working with yeast:H.
Thanks, this I know. I thought there might be a better way.
It should come automatically from a BigWig file (it's embedded within the file). For bedGraph, unless the file has the genome indicated in a UCSC-style "track line", you will need to tell it the genome via the
genome=
argument (for example, "hg38"). It will look up the seqlengths automatically from the genome.But what if I'm using t he Ensembl annotations? so my chromosome names differs from the standard UCSC naming?
The approach described by Herve is not so bad, in general. You can pass the Seqinfo object to import, e.g.,
after trying this example and not understanding, why it doesn't do what I want, I figured it out. The
Rle()
options looks better, but if I use it, I can't export the data as bedgraph (or bigwig). I get an error message claimingScores must be non-NA numeric values
.Is there a way to coerce the
Rle
score column into a numeric values vector?I fixed this in devel. Work around is just not to use Rles, or coerce them to numeric vectors on export. The export routines are not that optimized for GPos anyway.
yes thanks, I have figured it out by now. I am using now the
rep()
command instead.