find differences between two sets of ranges
1
0
Entering edit mode
@andreia-fonseca-3796
Last seen 7.7 years ago
Dear all, I have a range list which I have created like this: aln<-readAligned("GHO-23_filtered.fastq_aligned", type="Bowtie") cvg = coverage(aln) islands <- slice(cov, lower = 1) ranges_islands <-slice(cvg, 1, rangesOnly=TRUE) And then I have a GRange object which I have imported genes=import.gff3("Egrandis_162_gene.gff3", asRangedData=FALSE) I want to select the ranges in ranges_islands which do not overlap with genes into object final_ranges and then I want to select the reads from aln which are within or overlapping >50% with object final_ranges. I also would like to export the object final_ranges to a table chr, start, end. Can someone help? Thanks Andreia ---------------------------------------------------------------------- ------------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt [[alternative HTML version deleted]]
• 1.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States
On Sat, Jul 2, 2011 at 10:41 PM, Andreia Fonseca <andreia.fonseca@gmail.com>wrote: > Dear all, > > I have a range list which I have created like this: > > aln<-readAligned("GHO-23_filtered.fastq_aligned", type="Bowtie") > cvg = coverage(aln) > islands <- slice(cov, lower = 1) > ranges_islands <-slice(cvg, 1, rangesOnly=TRUE) > > And then I have a GRange object which I have imported > > genes=import.gff3("Egrandis_162_gene.gff3", asRangedData=FALSE) > > I want to select the ranges in ranges_islands which do not overlap with > genes into object final_ranges and then I want to select the reads from aln > which are within or overlapping >50% with object final_ranges. > > If your reads are of a fixed length, then something like this should work: final_ranges <- setdiff(ranges_islands, genes) final_aln <- aln[!is.na(findOverlaps(as(aln, "GRanges"), final_ranges, select = "first", minoverlap = halfReadLength))] Michael > I also would like to export the object final_ranges to a table chr, start, > end. > > Can someone help? > Thanks > Andreia > > > -------------------------------------------------------------------- --------------------------- > Andreia J. Amaral, PhD > BioFIG - Center for Biodiversity, Functional and Integrative Genomics > Instituto de Medicina Molecular > University of Lisbon > Tel: +352 217500000 (ext. office: 28253) > email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Michael, thanks for the reply but I already tried to do setdiff and I get this message, Error in as.vector(x) : no method for coercing this S4 class to a vector My reads are not all of the same length, the read length varies from 18 to 26 bp, these are from a small RNA seq experiment. Thanks for the help. Kind regards, Andreia On Mon, Jul 4, 2011 at 12:24 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > > On Sat, Jul 2, 2011 at 10:41 PM, Andreia Fonseca < > andreia.fonseca@gmail.com> wrote: > >> Dear all, >> >> I have a range list which I have created like this: >> >> aln<-readAligned("GHO-23_filtered.fastq_aligned", type="Bowtie") >> cvg = coverage(aln) >> islands <- slice(cov, lower = 1) >> ranges_islands <-slice(cvg, 1, rangesOnly=TRUE) >> >> And then I have a GRange object which I have imported >> >> genes=import.gff3("Egrandis_162_gene.gff3", asRangedData=FALSE) >> >> I want to select the ranges in ranges_islands which do not overlap with >> genes into object final_ranges and then I want to select the reads from >> aln >> which are within or overlapping >50% with object final_ranges. >> >> > If your reads are of a fixed length, then something like this should work: > > final_ranges <- setdiff(ranges_islands, genes) > final_aln <- aln[!is.na(findOverlaps(as(aln, "GRanges"), final_ranges, > select = "first", minoverlap = halfReadLength))] > > Michael > > > >> I also would like to export the object final_ranges to a table chr, start, >> end. >> >> Can someone help? >> Thanks >> Andreia >> >> >> ------------------------------------------------------------------- ---------------------------- >> Andreia J. Amaral, PhD >> BioFIG - Center for Biodiversity, Functional and Integrative Genomics >> Instituto de Medicina Molecular >> University of Lisbon >> Tel: +352 217500000 (ext. office: 28253) >> email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- ---------------------------------------------------------------------- ----------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 07/04/2011 05:58 AM, Andreia Fonseca wrote: > Hi Michael, > > thanks for the reply but I already tried to do setdiff and I get this > message, > > Error in as.vector(x) : no method for coercing this S4 class to a vector > > My reads are not all of the same length, the read length varies from 18 to > 26 bp, these are from a small RNA seq experiment. Hi Andreia -- it might help to provide a more explicit example, with a couple of ranges illustrating each of the data structures you're working with. Martin > > Thanks for the help. > Kind regards, > Andreia > > On Mon, Jul 4, 2011 at 12:24 AM, Michael Lawrence<lawrence.michael at="" gene.com="">> wrote: > >> >> >> On Sat, Jul 2, 2011 at 10:41 PM, Andreia Fonseca< >> andreia.fonseca at gmail.com> wrote: >> >>> Dear all, >>> >>> I have a range list which I have created like this: >>> >>> aln<-readAligned("GHO-23_filtered.fastq_aligned", type="Bowtie") >>> cvg = coverage(aln) >>> islands<- slice(cov, lower = 1) >>> ranges_islands<-slice(cvg, 1, rangesOnly=TRUE) >>> >>> And then I have a GRange object which I have imported >>> >>> genes=import.gff3("Egrandis_162_gene.gff3", asRangedData=FALSE) >>> >>> I want to select the ranges in ranges_islands which do not overlap with >>> genes into object final_ranges and then I want to select the reads from >>> aln >>> which are within or overlapping>50% with object final_ranges. >>> >>> >> If your reads are of a fixed length, then something like this should work: >> >> final_ranges<- setdiff(ranges_islands, genes) >> final_aln<- aln[!is.na(findOverlaps(as(aln, "GRanges"), final_ranges, >> select = "first", minoverlap = halfReadLength))] >> >> Michael >> >> >> >>> I also would like to export the object final_ranges to a table chr, start, >>> end. >>> >>> Can someone help? >>> Thanks >>> Andreia >>> >>> >>> ------------------------------------------------------------------ ----------------------------- >>> Andreia J. Amaral, PhD >>> BioFIG - Center for Biodiversity, Functional and Integrative Genomics >>> Instituto de Medicina Molecular >>> University of Lisbon >>> Tel: +352 217500000 (ext. office: 28253) >>> email:andreiaamaral at fm.ul.pt ; andreiaamaral at fc.ul.pt >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
Hi Martin, thanks for your email. Below you can find a head of each of the objects and the sessionInfo. Is this what you were asking? Thanks for the help Andreia the object ranges_islands head(ranges_islands) CompressedIRangesList of length 6 $scaffold_1 IRanges of length 60734 start end width [1] 1942 1981 40 [2] 1986 2023 38 [3] 2313 2336 24 [4] 2453 2495 43 [5] 2649 2672 24 [6] 3238 3263 26 [7] 3281 3303 23 [8] 3518 3541 24 [9] 3558 3585 28 ... ... ... ... [60726] 40290366 40290389 24 [60727] 40290560 40290583 24 [60728] 40293329 40293347 19 [60729] 40293449 40293472 24 [60730] 40293826 40293849 24 [60731] 40293862 40293886 25 [60732] 40296294 40296317 24 [60733] 40296401 40296424 24 [60734] 40296955 40296978 24 ... <5 more elements> head(genes) GRanges with 6 ranges and 7 elementMetadata values seqnames ranges strand | type source phase ID Name Parent pacid <rle> <iranges> <rle> | <factor> <factor> <factor> <character> <character> <character> <numeric> [1] scaffold_1 [27544798, 27558542] - | gene phytozome7_0 NA Egrandis_v1_0.000041m.g Egrandis_v1_0.000041m.g NA NA [2] scaffold_1 [27544798, 27558542] - | mRNA phytozome7_0 NA PAC:18798967 Egrandis_v1_0.000042m Egrandis_v1_0.000041m.g 18798967 [3] scaffold_1 [27558337, 27558542] - | five_prime_UTR phytozome7_0 NA PAC:18798967.five_prime_UTR.1 NA PAC:18798967 18798967 [4] scaffold_1 [27555581, 27557960] - | CDS phytozome7_0 0 PAC:18798967.CDS.1 NA PAC:18798967 18798967 [5] scaffold_1 [27557961, 27558086] - | five_prime_UTR phytozome7_0 NA PAC:18798967.five_prime_UTR.2 NA PAC:18798967 18798967 [6] scaffold_1 [27555371, 27555427] - | CDS phytozome7_0 2 PAC:18798967.CDS.2 NA PAC:18798967 18798967 seqlengths scaffold_1 scaffold_10 scaffold_100 scaffold_1002 scaffold_1005 scaffold_1007 ... scaffold_99 scaffold_992 scaffold_995 scaffold_996 scaffold_997 scaffold_998 NA NA NA NA NA NA ... NA NA NA NA NA NA sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.10.4 Rsamtools_1.4.2 lattice_0.19-23 Biostrings_2.20.1 GenomicRanges_1.4.6 IRanges_1.10.4 loaded via a namespace (and not attached): [1] Biobase_2.12.1 grid_2.13.0 hwriter_1.3 On Mon, Jul 4, 2011 at 3:11 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 07/04/2011 05:58 AM, Andreia Fonseca wrote: > >> Hi Michael, >> >> thanks for the reply but I already tried to do setdiff and I get this >> message, >> >> Error in as.vector(x) : no method for coercing this S4 class to a vector >> >> My reads are not all of the same length, the read length varies from 18 to >> 26 bp, these are from a small RNA seq experiment. >> > > Hi Andreia -- it might help to provide a more explicit example, with a > couple of ranges illustrating each of the data structures you're working > with. Martin > > > >> Thanks for the help. >> Kind regards, >> Andreia >> >> On Mon, Jul 4, 2011 at 12:24 AM, Michael Lawrence<lawrence.michael@**>> gene.com <lawrence.michael@gene.com> >> >>> wrote: >>> >> >> >>> >>> On Sat, Jul 2, 2011 at 10:41 PM, Andreia Fonseca< >>> andreia.fonseca@gmail.com> wrote: >>> >>> Dear all, >>>> >>>> I have a range list which I have created like this: >>>> >>>> aln<-readAligned("GHO-23_**filtered.fastq_aligned", type="Bowtie") >>>> cvg = coverage(aln) >>>> islands<- slice(cov, lower = 1) >>>> ranges_islands<-slice(cvg, 1, rangesOnly=TRUE) >>>> >>>> And then I have a GRange object which I have imported >>>> >>>> genes=import.gff3("Egrandis_**162_gene.gff3", asRangedData=FALSE) >>>> >>>> I want to select the ranges in ranges_islands which do not overlap with >>>> genes into object final_ranges and then I want to select the reads from >>>> aln >>>> which are within or overlapping>50% with object final_ranges. >>>> >>>> >>>> If your reads are of a fixed length, then something like this should >>> work: >>> >>> final_ranges<- setdiff(ranges_islands, genes) >>> final_aln<- aln[!is.na(findOverlaps(as(**aln, "GRanges"), final_ranges, >>> select = "first", minoverlap = halfReadLength))] >>> >>> Michael >>> >>> >>> >>> I also would like to export the object final_ranges to a table chr, >>>> start, >>>> end. >>>> >>>> Can someone help? >>>> Thanks >>>> Andreia >>>> >>>> >>>> ------------------------------**------------------------------** >>>> ------------------------------**----- >>>> Andreia J. Amaral, PhD >>>> BioFIG - Center for Biodiversity, Functional and Integrative Genomics >>>> Instituto de Medicina Molecular >>>> University of Lisbon >>>> Tel: +352 217500000 (ext. office: 28253) >>>> email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >> >> > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > -- ---------------------------------------------------------------------- ----------------------- Andreia J. Amaral, PhD BioFIG - Center for Biodiversity, Functional and Integrative Genomics Instituto de Medicina Molecular University of Lisbon Tel: +352 217500000 (ext. office: 28253) email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Adreia, On Mon, Jul 4, 2011 at 1:36 PM, Andreia Fonseca <andreia.fonseca at="" gmail.com=""> wrote: > Hi Martin, > > thanks for your email. Below you can find a head of each of the objects and > the sessionInfo. Is this what you were asking? Unlikely :-) We would only need an example of a *small number* of ranges that shows the problem that you have and allows us to play with it. So -- reduce your CompressedRangesList into something that you can paste into an email (using dput, for example) that we can then copy and play with. For instance, `x` is a small IRangesList I have in my workspace. The output of dput is: R> dput(x) new("CompressedIRangesList" , elementType = "IRanges" , elementMetadata = NULL , metadata = list() , partitioning = new("PartitioningByEnd" , end = c(3L, 7L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , unlistData = new("IRanges" , start = c(1L, 2L, 3L, 15L, 45L, 20L, 1L) , width = c(5L, 1L, 6L, 1L, 56L, 61L, 5L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) ) If you now copy the result of that and assign it to a variable in your workspace, eg: R> my.yours <- new("CompressedIRangesList", ...) You'l see that you now have the same IRangesList that I may be having problems with ... so ... give us some objects like this that we can then reconstruct and play with to see how to get you the answer you want ... Make sense? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
On Mon, Jul 4, 2011 at 5:58 AM, Andreia Fonseca <andreia.fonseca@gmail.com>wrote: > Hi Michael, > > thanks for the reply but I already tried to do setdiff and I get this > message, > > Error in as.vector(x) : no method for coercing this S4 class to a vector > For setdiff() to work, both arguments will need to be GRanges, so you'll need to coerce ranges_islands to a GRanges with as(ranges_islands, "GRanges"). Michael > > My reads are not all of the same length, the read length varies from 18 to > 26 bp, these are from a small RNA seq experiment. > > Thanks for the help. > Kind regards, > Andreia > > > On Mon, Jul 4, 2011 at 12:24 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> On Sat, Jul 2, 2011 at 10:41 PM, Andreia Fonseca < >> andreia.fonseca@gmail.com> wrote: >> >>> Dear all, >>> >>> I have a range list which I have created like this: >>> >>> aln<-readAligned("GHO-23_filtered.fastq_aligned", type="Bowtie") >>> cvg = coverage(aln) >>> islands <- slice(cov, lower = 1) >>> ranges_islands <-slice(cvg, 1, rangesOnly=TRUE) >>> >>> And then I have a GRange object which I have imported >>> >>> genes=import.gff3("Egrandis_162_gene.gff3", asRangedData=FALSE) >>> >>> I want to select the ranges in ranges_islands which do not overlap with >>> genes into object final_ranges and then I want to select the reads from >>> aln >>> which are within or overlapping >50% with object final_ranges. >>> >>> >> If your reads are of a fixed length, then something like this should work: >> >> final_ranges <- setdiff(ranges_islands, genes) >> final_aln <- aln[!is.na(findOverlaps(as(aln, "GRanges"), final_ranges, >> select = "first", minoverlap = halfReadLength))] >> >> Michael >> >> >> >>> I also would like to export the object final_ranges to a table chr, >>> start, >>> end. >>> >>> Can someone help? >>> Thanks >>> Andreia >>> >>> >>> ------------------------------------------------------------------ ----------------------------- >>> Andreia J. Amaral, PhD >>> BioFIG - Center for Biodiversity, Functional and Integrative Genomics >>> Instituto de Medicina Molecular >>> University of Lisbon >>> Tel: +352 217500000 (ext. office: 28253) >>> email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > > -- > > -------------------------------------------------------------------- ------------------------- > Andreia J. Amaral, PhD > BioFIG - Center for Biodiversity, Functional and Integrative Genomics > Instituto de Medicina Molecular > University of Lisbon > Tel: +352 217500000 (ext. office: 28253) > email:andreiaamaral@fm.ul.pt ; andreiaamaral@fc.ul.pt > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6