using and combining of "subseq"

0

Entering edit mode

Kristian Ullrich ▴ 10

@kristian-ullrich-4698

Last seen 10.6 years ago

Hello Biostrings curators, again the question to you: Is there an easier way to solve the follwing: R-code: #################### #################### library(Biostrings) #example sequence seq.list=list() seq.list[1]="AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC" seq.list[2]="TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA" fas.seq = DNAStringSet(unlist(seq.list)) #defining start and end points of subseq start1 = 1 end1 = 10 start2 = 21 end2 = 25 #creating first and second subseq first.subseq = subseq(fas.seq,start1,end1) second.subseq = subseq(fas.seq,start2,end2) new.seq = DNAStringSet(apply(sapply(list(first.subseq,second.subseq),as.characte r),1,function(x) paste(x,collapse=""))) names(new.seq) = names(fas.seq) #################### #################### I basically want to combine subseqs from one DNAStringset, something like: subseq(DNAStringSet, start = c(start1,start2), end = c(end1,end2)) would be nice. Thank you in anticipation Kristian Ullrich -- Kristian Ullrich Leibniz Institute of Plant Biochemistry Weinberg 3 D-06120 Halle (Saale), Germany phone +49 345 5582 1221 fax +49 345 5582 1209 mail kullrich at ipb-halle.de

Biostrings Biostrings • 2.0k views

ADD COMMENT • link updated 13.8 years ago by Harris A. Jaffee ▴ 590 • written 13.8 years ago by Kristian Ullrich ▴ 10

0

Entering edit mode

Harris A. Jaffee ▴ 590

@harris-a-jaffee-3972

Last seen 10.5 years ago

United States

x1 = "AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC" x2 = "TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA" X = DNAStringSet(c(x1, x2)) > X A DNAStringSet instance of length 2 width seq [1] 40 AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC [2] 40 TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA > start1 = 1 > end1 = 10 > > start2 = 21 > end2 = 25 s1 = subseq(X, start1, end1) s2 = subseq(X, start2, end2) answer = DNAStringSet(paste(s1, s2, sep="")) > answer A DNAStringSet instance of length 2 width seq [1] 15 AAAAAAAAAAGGGGG [2] 15 TTTTTTTTTTCCCCC On Jun 15, 2011, at 7:36 AM, Kristian Ullrich wrote: > Hello Biostrings curators, > > again the question to you: > > Is there an easier way to solve the follwing: > > R-code: > #################### > #################### > library(Biostrings) > > #example sequence > seq.list=list() > seq.list[1]="AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC" > seq.list[2]="TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA" > fas.seq = DNAStringSet(unlist(seq.list)) > > #defining start and end points of subseq > start1 = 1 > end1 = 10 > > start2 = 21 > end2 = 25 > > #creating first and second subseq > first.subseq = subseq(fas.seq,start1,end1) > second.subseq = subseq(fas.seq,start2,end2) > > new.seq = DNAStringSet(apply(sapply(list > (first.subseq,second.subseq),as.character),1,function(x) paste > (x,collapse=""))) > names(new.seq) = names(fas.seq) > #################### > #################### > > I basically want to combine subseqs from one DNAStringset, > something like: > > subseq(DNAStringSet, start = c(start1,start2), end = c(end1,end2)) > > would be nice. > > Thank you in anticipation > > Kristian Ullrich > -- > Kristian Ullrich > > Leibniz Institute of Plant Biochemistry > Weinberg 3 > D-06120 Halle (Saale), Germany > phone +49 345 5582 1221 > fax +49 345 5582 1209 > mail kullrich at ipb-halle.de > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.8 years ago Harris A. Jaffee ▴ 590

0

Entering edit mode

On 11-06-15 11:09 AM, Harris A. Jaffee wrote: > x1 = "AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC" > x2 = "TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA" > X = DNAStringSet(c(x1, x2)) > > > X > A DNAStringSet instance of length 2 > width seq > [1] 40 AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC > [2] 40 TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA > >> start1 = 1 >> end1 = 10 >> >> start2 = 21 >> end2 = 25 > > s1 = subseq(X, start1, end1) > s2 = subseq(X, start2, end2) > answer = DNAStringSet(paste(s1, s2, sep="")) Or 'answer = xscat(s1, s2)' would be more efficient here, especially if 's1' and 's2' contain hundreds of thousands of sequences. Cheers, H. > > > answer > A DNAStringSet instance of length 2 > width seq > [1] 15 AAAAAAAAAAGGGGG > [2] 15 TTTTTTTTTTCCCCC > > On Jun 15, 2011, at 7:36 AM, Kristian Ullrich wrote: > >> Hello Biostrings curators, >> >> again the question to you: >> >> Is there an easier way to solve the follwing: >> >> R-code: >> #################### >> #################### >> library(Biostrings) >> >> #example sequence >> seq.list=list() >> seq.list[1]="AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC" >> seq.list[2]="TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA" >> fas.seq = DNAStringSet(unlist(seq.list)) >> >> #defining start and end points of subseq >> start1 = 1 >> end1 = 10 >> >> start2 = 21 >> end2 = 25 >> >> #creating first and second subseq >> first.subseq = subseq(fas.seq,start1,end1) >> second.subseq = subseq(fas.seq,start2,end2) >> >> new.seq = >> DNAStringSet(apply(sapply(list(first.subseq,second.subseq),as.chara cter),1,function(x) >> paste(x,collapse=""))) >> names(new.seq) = names(fas.seq) >> #################### >> #################### >> >> I basically want to combine subseqs from one DNAStringset, something >> like: >> >> subseq(DNAStringSet, start = c(start1,start2), end = c(end1,end2)) >> >> would be nice. >> >> Thank you in anticipation >> >> Kristian Ullrich >> -- >> Kristian Ullrich >> >> Leibniz Institute of Plant Biochemistry >> Weinberg 3 >> D-06120 Halle (Saale), Germany >> phone +49 345 5582 1221 >> fax +49 345 5582 1209 >> mail kullrich at ipb-halle.de >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 13.8 years ago Hervé Pagès 16k

Login before adding your answer.