XStringViews with strand?
1
0
Entering edit mode
@cei-abreu-goodger-4433
Last seen 9.7 years ago
Mexico
Hi all, I'm trying to read in a fasta sequence, extract the "gene sequences" and write these out to a fasta file. I can read the sequences with read.DNAStringSet(), obtain an XStringViews object with Views(), but I'm having trouble knowing how to obtain the reverse complement sequence for the genes on the "-" strand. I can get them with a reverseComplement() of the XStringViews object, but I can't overwrite the elements of this object. So my solution involves dealing separately with all the genes on the "+" strand and those on the "-" strand. Is there an easier way? An example: file <- system.file("extdata", "someORF.fa", package="Biostrings") x <- read.DNAStringSet(file, "fasta")[[1]] names <- c("a","b","c") starts <- c(10,384,947) ends <- starts+20 strands <- c("+","-","+") myViews <- Views(x, start=starts, end=ends) names(myViews) <- names revViews <- reverseComplement(myViews[strands=="-"]) posViews <- myViews[strands=="+"] Many thanks, Cei sessionInfo() R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] tools stats graphics grDevices datasets utils methods [8] base other attached packages: [1] Biostrings_2.12.1 IRanges_1.2.2 Biobase_2.4.1
• 1.4k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States
Hi Cei, Cei Abreu-Goodger wrote: > Hi all, > > I'm trying to read in a fasta sequence, extract the "gene sequences" and > write these out to a fasta file. I can read the sequences with > read.DNAStringSet(), obtain an XStringViews object with Views(), but I'm > having trouble knowing how to obtain the reverse complement sequence for > the genes on the "-" strand. I can get them with a reverseComplement() > of the XStringViews object, but I can't overwrite the elements of this > object. So my solution involves dealing separately with all the genes on > the "+" strand and those on the "-" strand. Is there an easier way? > > An example: > > file <- system.file("extdata", "someORF.fa", package="Biostrings") > x <- read.DNAStringSet(file, "fasta")[[1]] > > names <- c("a","b","c") > starts <- c(10,384,947) > ends <- starts+20 > strands <- c("+","-","+") > > myViews <- Views(x, start=starts, end=ends) > names(myViews) <- names > > revViews <- reverseComplement(myViews[strands=="-"]) > posViews <- myViews[strands=="+"] Yes it has to be done separately. The content of a view cannot be modified because that would mean modifying the underlying subject so you would end up with a subject that is a mix of + and - strand. And what should be done when 2 views overlap, 1 being associated with a gene on the + strand and the other one with a gene on the - strand? In the near future we will support replacement of the elements of a DNAStringSet object through "[<-" and "[[<-". Then it will be possible to reverseComplement some of its elements with something like this: x[strands == "-"] <- reverseComplement(x[strands == "-"]) But that will be for XStringSet only, not for XStringViews. These developments will take place in the devel version of Biostrings in the next few weeks. Cheers, H. > > > Many thanks, > > Cei > > sessionInfo() > R version 2.9.0 (2009-04-17) > i386-apple-darwin8.11.1 > > locale: > en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] tools stats graphics grDevices datasets utils methods > [8] base > > other attached packages: > [1] Biostrings_2.12.1 IRanges_1.2.2 Biobase_2.4.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Hi Herve, Thanks for the info. I'll check out the devel version soon, and I guess I'll be doing something like what you suggest: myDNAStringSet <- as(myViews, "DNAStringSet") myDNAStringSet[strands == "-"] <- reverseComplement(myDNAStringSet[strands == "-"]) Cheers, Cei Hervé Pagès wrote: > Hi Cei, > > Cei Abreu-Goodger wrote: >> Hi all, >> >> I'm trying to read in a fasta sequence, extract the "gene sequences" >> and write these out to a fasta file. I can read the sequences with >> read.DNAStringSet(), obtain an XStringViews object with Views(), but >> I'm having trouble knowing how to obtain the reverse complement >> sequence for the genes on the "-" strand. I can get them with a >> reverseComplement() >> of the XStringViews object, but I can't overwrite the elements of this >> object. So my solution involves dealing separately with all the genes >> on the "+" strand and those on the "-" strand. Is there an easier way? >> >> An example: >> >> file <- system.file("extdata", "someORF.fa", package="Biostrings") >> x <- read.DNAStringSet(file, "fasta")[[1]] >> >> names <- c("a","b","c") >> starts <- c(10,384,947) >> ends <- starts+20 >> strands <- c("+","-","+") >> >> myViews <- Views(x, start=starts, end=ends) >> names(myViews) <- names >> >> revViews <- reverseComplement(myViews[strands=="-"]) >> posViews <- myViews[strands=="+"] > > Yes it has to be done separately. The content of a view cannot be modified > because that would mean modifying the underlying subject so you would end > up with a subject that is a mix of + and - strand. And what should be done > when 2 views overlap, 1 being associated with a gene on the + strand and > the other one with a gene on the - strand? > > In the near future we will support replacement of the elements of a > DNAStringSet > object through "[<-" and "[[<-". Then it will be possible to > reverseComplement > some of its elements with something like this: > > x[strands == "-"] <- reverseComplement(x[strands == "-"]) > > But that will be for XStringSet only, not for XStringViews. > > These developments will take place in the devel version of Biostrings in > the > next few weeks. > > Cheers, > H. > > >> >> >> Many thanks, >> >> Cei >> >> sessionInfo() >> R version 2.9.0 (2009-04-17) >> i386-apple-darwin8.11.1 >> >> locale: >> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] tools stats graphics grDevices datasets utils methods >> [8] base >> >> other attached packages: >> [1] Biostrings_2.12.1 IRanges_1.2.2 Biobase_2.4.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.
ADD REPLY

Login before adding your answer.

Traffic: 785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6