Order within a GRanges object
1
0
Entering edit mode
@hermann-norpois-5726
Last seen 9.6 years ago
Germany
Hello, I have some points according to the internal order of granges objects. 1) Automatically there is an order depending on the a) seqnames (= chromosomes) and b) the ranges. 2) The seqnames are always sorted in ascii order. 3) After df <- as.data.frame m <- regexpr ("\\d+", df$seqnames, perl=TRUE) df$Chromosome <- regmatches (df$seqnames, m) df$Chromosome <- as.integer (as.character (df$Chromosome)) df <- df [order(df$Chromosome),] only the order of the chromosomes is changed. The order of the ranges (now df$start and df$end) is still the same. Are my assumptions true? Thanks Hermann [[alternative HTML version deleted]]
• 3.0k views
ADD COMMENT
0
Entering edit mode
Malcolm Cook ★ 1.6k
@malcolm-cook-6293
Last seen 4 months ago
United States
>Hello, > >I have some points according to the internal order of granges objects. > >1) Automatically there is an order depending on the a) seqnames (= >chromosomes) and b) the ranges. no! There is no gaurantee on the order. > library(GenomicRanges) > example(GRanges) ... > longGR GRanges with 30 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> a chr1 [1, 10] - | 1 b chr2 [2, 10] + | 2 c chr2 [3, 10] + | 3 d chr2 [4, 10] * | 4 e chr1 [5, 10] * | 5 ... ... ... ... ... ... chr2 [106, 115] - | 26 chr2 [107, 116] - | 27 chr3 [108, 117] - | 28 chr3 [109, 118] - | 29 chr3 [110, 119] - | 30 --- seqlengths: chr1 chr2 chr3 1000 2000 1500 > rev(longGR) GRanges with 30 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> chr3 [110, 119] - | 30 chr3 [109, 118] - | 29 chr3 [108, 117] - | 28 chr2 [107, 116] - | 27 chr2 [106, 115] - | 26 ... ... ... ... ... ... e chr1 [5, 10] * | 5 d chr2 [4, 10] * | 4 c chr2 [3, 10] + | 3 b chr2 [2, 10] + | 2 a chr1 [1, 10] - | 1 --- seqlengths: chr1 chr2 chr3 1000 2000 1500 > > >2) The seqnames are always sorted in ascii order. No! but they _can_ be: > sort(longGR) GRanges with 30 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> f chr1 [6, 10] + | 6 chr1 [1, 5] - | 101 a chr1 [1, 10] - | 1 chr1 [2, 6] - | 102 chr1 [3, 7] - | 103 ... ... ... ... ... ... j chr3 [ 10, 10] - | 10 chr3 [ 10, 14] - | 110 chr3 [108, 117] - | 28 chr3 [109, 118] - | 29 chr3 [110, 119] - | 30 --- seqlengths: chr1 chr2 chr3 1000 2000 1500 ~ Malcolm Cook > >3) After > df <- as.data.frame > m <- regexpr ("\\d+", df$seqnames, perl=TRUE) > df$Chromosome <- regmatches (df$seqnames, m) > df$Chromosome <- as.integer (as.character (df$Chromosome)) > df <- df [order(df$Chromosome),] > only the order of the chromosomes is changed. The order of the ranges >(now df$start and df$end) is still the same. > >Are my assumptions true? > >Thanks Hermann > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Malcolm, Hermann, On 08/20/2013 06:05 AM, Cook, Malcolm wrote: >> Hello, > > > >I have some points according to the internal order of granges objects. > > > >1) Automatically there is an order depending on the a) seqnames (= > >chromosomes) and b) the ranges. > > no! There is no gaurantee on the order. > >> library(GenomicRanges) >> example(GRanges) > ... >> longGR > GRanges with 30 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > a chr1 [1, 10] - | 1 > b chr2 [2, 10] + | 2 > c chr2 [3, 10] + | 3 > d chr2 [4, 10] * | 4 > e chr1 [5, 10] * | 5 > ... ... ... ... ... ... > chr2 [106, 115] - | 26 > chr2 [107, 116] - | 27 > chr3 [108, 117] - | 28 > chr3 [109, 118] - | 29 > chr3 [110, 119] - | 30 > --- > seqlengths: > chr1 chr2 chr3 > 1000 2000 1500 >> rev(longGR) > GRanges with 30 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > chr3 [110, 119] - | 30 > chr3 [109, 118] - | 29 > chr3 [108, 117] - | 28 > chr2 [107, 116] - | 27 > chr2 [106, 115] - | 26 > ... ... ... ... ... ... > e chr1 [5, 10] * | 5 > d chr2 [4, 10] * | 4 > c chr2 [3, 10] + | 3 > b chr2 [2, 10] + | 2 > a chr1 [1, 10] - | 1 > --- > seqlengths: > chr1 chr2 chr3 > 1000 2000 1500 >> > > > > >2) The seqnames are always sorted in ascii order. > > No! but they _can_ be: > >> sort(longGR) > GRanges with 30 ranges and 1 metadata column: > seqnames ranges strand | score > <rle> <iranges> <rle> | <integer> > f chr1 [6, 10] + | 6 > chr1 [1, 5] - | 101 > a chr1 [1, 10] - | 1 > chr1 [2, 6] - | 102 > chr1 [3, 7] - | 103 > ... ... ... ... ... ... > j chr3 [ 10, 10] - | 10 > chr3 [ 10, 14] - | 110 > chr3 [108, 117] - | 28 > chr3 [109, 118] - | 29 > chr3 [110, 119] - | 30 > --- > seqlengths: > chr1 chr2 chr3 > 1000 2000 1500 Just a small point of clarification. The ordering of the seqnames in lexicographical order here is just a consequence of the fact that the seqlevels are already ordered in lexicographical order. If you change the order of the seqlevels first, then sort() will produce a different result: seqlevels(longGR) <- seqlevels(longGR)[c(2,3,1)] Then: > seqlevels(longGR) [1] "chr2" "chr3" "chr1" > sort(longGR) GRanges with 30 ranges and 1 metadata column: seqnames ranges strand | score <rle> <iranges> <rle> | <integer> b chr2 [2, 10] + | 2 c chr2 [3, 10] + | 3 chr2 [4, 8] - | 104 chr2 [5, 9] - | 105 chr2 [6, 10] - | 106 ... ... ... ... ... ... chr1 [ 3, 7] - | 103 chr1 [101, 110] - | 21 chr1 [102, 111] - | 22 chr1 [103, 112] - | 23 e chr1 [ 5, 10] * | 5 --- seqlengths: chr2 chr3 chr1 2000 1500 1000 Cheers, H. > > > ~ Malcolm Cook > > > > > >3) After > > df <- as.data.frame > > m <- regexpr ("\\d+", df$seqnames, perl=TRUE) > > df$Chromosome <- regmatches (df$seqnames, m) > > df$Chromosome <- as.integer (as.character (df$Chromosome)) > > df <- df [order(df$Chromosome),] > > only the order of the chromosomes is changed. The order of the ranges > >(now df$start and df$end) is still the same. > > > >Are my assumptions true? > > > >Thanks Hermann > > > > [[alternative HTML version deleted]] > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 623 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6