extracting sequence from a genome
1
0
Entering edit mode
@iain-gallagher-2532
Last seen 9.5 years ago
United Kingdom
Hello List I have a dataframe of miRNA genomic positions and I would like to get sequence for 200bp upstream of each microRNA. library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome ftpAddr <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords mirInfo <- read.table(ftpAddr) # as dataframe seqs <- list() # holder for (i in 1:nrow(mirInfo)){ seq <- getSeq(Rnorvegicus, paste('chr', mirInfo[i,1], sep=''), start = mirInfo[i,4], end = mirInfo[i,4]+200) seqs <- c(seqs, seq) } This works but seems to be pretty inefficient in terms of computing power as my pc locks up during the loop. Could someone point me to a better way? Thanks iain
miRNA microRNA miRNA microRNA • 6.2k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 4.4 years ago
United States
what you want to do is create a Ranges object and then get a View of the sequences. library(Biostrings) library(GenomicRanges) mirs <- GRanges( paste('chr', mirInfo[i,1], sep=''), IRanges(start=mirInfo[i,4], width=1), strand=whereverYouKeepTheStrand ) upstream <- flank(mirs, 200) upseqs <- Views(Rnorvegicus, mirs) show(upseqs) Untested because I am lazy as hell, but it should work, and be a LOT faster. On Thu, Mar 15, 2012 at 8:23 AM, Iain Gallagher < iaingallagher@btopenworld.com> wrote: > > > Hello List > > I have a dataframe of miRNA genomic positions and I would like to get > sequence for 200bp upstream of each microRNA. > > library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome > > ftpAddr <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get > miR coords > mirInfo <- read.table(ftpAddr) # as dataframe > > seqs <- list() # holder > > for (i in 1:nrow(mirInfo)){ > seq <- getSeq(Rnorvegicus, paste('chr', mirInfo[i,1], sep=''), start = > mirInfo[i,4], end = mirInfo[i,4]+200) > seqs <- c(seqs, seq) > } > > This works but seems to be pretty inefficient in terms of computing power > as my pc locks up during the loop. > > Could someone point me to a better way? > > Thanks > > iain > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Howdy, On Thu, Mar 15, 2012 at 1:10 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: > what you want to do is create a Ranges object and then get a View of the > sequences. > > > library(Biostrings) > library(GenomicRanges) > mirs <- GRanges( paste('chr', mirInfo[i,1], sep=''), > ? ? ? ? ? ? ? ? IRanges(start=mirInfo[i,4], width=1), > ? ? ? ? ? ? ? ? strand=whereverYouKeepTheStrand ) > upstream <- flank(mirs, 200) > upseqs <- Views(Rnorvegicus, mirs) > show(upseqs) > > Untested because I am lazy as hell, but it should work, and be a LOT faster. Maybe in R 2.16 ... R> Views(Hsapiens, GRanges(c('chr1', 'chr2'), IRanges(c(6e6, 6e6), width=10), '+')) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "Views", for signature "BSgenome" ;-) -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
Yeah I just realized that when I pasted it in to execute it. Trying to find where exactly I did this in the past (because it worked back then) On Thu, Mar 15, 2012 at 10:42 AM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Howdy, > > On Thu, Mar 15, 2012 at 1:10 PM, Tim Triche, Jr. <tim.triche@gmail.com> > wrote: > > what you want to do is create a Ranges object and then get a View of the > > sequences. > > > > > > library(Biostrings) > > library(GenomicRanges) > > mirs <- GRanges( paste('chr', mirInfo[i,1], sep=''), > > IRanges(start=mirInfo[i,4], width=1), > > strand=whereverYouKeepTheStrand ) > > upstream <- flank(mirs, 200) > > upseqs <- Views(Rnorvegicus, mirs) > > show(upseqs) > > > > Untested because I am lazy as hell, but it should work, and be a LOT > faster. > > Maybe in R 2.16 ... > > R> Views(Hsapiens, GRanges(c('chr1', 'chr2'), IRanges(c(6e6, 6e6), > width=10), '+')) > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "Views", for > signature "BSgenome" > > ;-) > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Gentlemen Thanks for taking the time. Here's where I am just now: library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome library(GenomicRanges) fl <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords gff <- read.table(fl) # as dataframe names <- gff[,10] nms <- gsub(";", "", gsub("\"", "", gsub("ID=\"", "", names))) # a set of nested gsub with regex to leave only the text in the double quotes gr <- GRanges(seqnames = Rle(paste('chr', gff[,1], sep='')), ranges = IRanges(gff[,4], end = gff[,5], names = nms), strand = Rle(gff[,7])) seqs <- getSeq(Rnorvegicus, flank(gr, 200)) names(seqs) <- nms It's much lighter on it's feet than a loop and a nice intro to the GenomicRanges package for me. As a follow-up question I'm going to write out the seqs object as fasta and use it in clover for TFBS analysis. I was wondering whether the strand is taken into account when I get the flanking sequence i.e. is the sequence returned from the + or - strand as defined in the GRanges object? I only ask this because presumably that will affect my TFBS analysis and if so I might want to reverse / complement all those sequences that are upstream of miRs transcribed from the - strand. Best iain ________________________________ From: "Tim Triche, Jr." <tim.triche@gmail.com> To: Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> Cc: Iain Gallagher <iaingallagher at="" btopenworld.com="">; bioconductor <bioconductor at="" stat.math.ethz.ch=""> Sent: Thursday, 15 March 2012, 17:44 Subject: Re: [BioC] extracting sequence from a genome Yeah I just realized that when I pasted it in to execute it. ?Trying to find where exactly I did this in the past (because it worked back then) On Thu, Mar 15, 2012 at 10:42 AM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: Howdy, > > >On Thu, Mar 15, 2012 at 1:10 PM, Tim Triche, Jr. <tim.triche at="" gmail.com=""> wrote: >> what you want to do is create a Ranges object and then get a View of the >> sequences. >> >> >> library(Biostrings) >> library(GenomicRanges) >> mirs <- GRanges( paste('chr', mirInfo[i,1], sep=''), >> ? ? ? ? ? ? ? ? IRanges(start=mirInfo[i,4], width=1), >> ? ? ? ? ? ? ? ? strand=whereverYouKeepTheStrand ) >> upstream <- flank(mirs, 200) >> upseqs <- Views(Rnorvegicus, mirs) >> show(upseqs) >> >> Untested because I am lazy as hell, but it should work, and be a LOT faster. > >Maybe in R 2.16 ... > >R> Views(Hsapiens, GRanges(c('chr1', 'chr2'), IRanges(c(6e6, 6e6), >width=10), '+')) >Error in function (classes, fdef, mtable) ?: >?unable to find an inherited method for function "Views", for >signature "BSgenome" > >;-) > >-steve > >-- >Steve Lianoglou >Graduate Student: Computational Systems Biology >?| Memorial Sloan-Kettering Cancer Center >?| Weill Medical College of Cornell University >Contact Info: http://cbio.mskcc.org/~lianos/contact > -- A model is a lie that helps you see the truth. Howard Skipper
ADD REPLY
0
Entering edit mode
Hi Iain, On Thu, Mar 15, 2012 at 2:16 PM, Iain Gallagher <iaingallagher at="" btopenworld.com=""> wrote: > > > Hi Gentlemen > > Thanks for taking the time. Here's where I am just now: > > library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome > library(GenomicRanges) > > fl <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords > gff <- read.table(fl) # as dataframe > > names <- gff[,10] > nms <- gsub(";", "", gsub("\"", "", gsub("ID=\"", "", names))) # a set of nested gsub with regex to leave only the text in the double quotes > > gr <- GRanges(seqnames = Rle(paste('chr', gff[,1], sep='')), ranges = IRanges(gff[,4], end = gff[,5], names = nms), strand = Rle(gff[,7])) > > seqs <- getSeq(Rnorvegicus, flank(gr, 200)) > names(seqs) <- nms > > It's much lighter on it's feet than a loop and a nice intro to the GenomicRanges package for me. > > As a follow-up question I'm going to write out the seqs object as fasta and use it in clover for TFBS analysis. > > > I was wondering whether the strand is taken into account when I get the flanking sequence i.e. is the sequence returned from the + or - strand as defined in the GRanges object? A call to `flank` on a GRanges object is "strand aware" -- is that what you're asking? Consider: R> gr <- GRanges('chr1', IRanges(c(20, 20), width=5), c('+', '-')) R> flank(gr, 10) GRanges with 2 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [10, 19] + [2] chr1 [25, 34] - R> flank(gr, 10, start=FALSE) GRanges with 2 ranges and 0 elementMetadata values: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [25, 34] + [2] chr1 [10, 19] - getSeq is also strand aware -- it will return the reverse complement of the sequence in your bounds if its on the `-` strand. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6