Entering edit mode
Dear Noah,
Thanks so much for sharing your example and excellent solution!
Yes, if you create your RangedData with names field, then the
sequences will
have names associated with, that are used by write2FASTA as sequence
headers. In the future release, headers could be automatically
generated if
no head found.
Here is the example to create a RangedData with names field filled in.
peaks = RangedData(IRanges(start=c(100, 500), end=c(300, 600),
names=c("peak1", "peak2")), space=c("NC_008253", "NC_010468"))
library(BSgenome.Ecoli.NCBI.20080805)
seq = getAllPeakSequence(peaks, upstream = 20,
downstream = 20, genome = Ecoli)
write2FASTA(seq)
Thanks so much for the positive feedback!
Kind regards,
Julie
*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program in Gene Function and Expression
Program in Molecular Medicine
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
On 8/4/10 8:42 PM, "Noah Dowell" <noahd at="" ucla.edu=""> wrote:
> Dear Julie,
>
> I came across a little problem today when using the write2FASTA
function. The
> function seemed to perform fine in that no error was thrown, but the
FASTA
> file that was "written" was empty. So I started to look through the
function
> to see if I could find anything and found that the data object
created with
> getAllPeakSequence had no rownames. When I manually added rownames
(see
> below) the write2FASTA function worked fine. It could be something
I messed
> up on my end but, I thought you might want to be aware of this.
Feel free to
> post this on the Bioconductor message board if you think it will be
helpful,
> but I didn't want to confuse people.
>
>
> Thank you for creating this excellent package.
>
> Kind regards,
>
> Noah
>
>
>
>
>> tRNArdSMALL <- as(temp, "RangedData")
>
>> seqtRNAsmall <- getAllPeakSequence(tRNArdSMALL, upstream = 100,
downstream =
>> 100, genome = Scerevisiae)
>
>
>
>> rownames(seqtRNAsmall)
> NULL
>
> # so I did this:
>
>> rownames(seqtRNAsmall) <- seq(1:86)
>
>> write2FASTA(seqtRNAsmall, file ="peakSeqNhp6AtRNAsmall", width =
80)
>
> ##Now the function works since the file contains 86 sequences in the
FASTA
> format.
>
> ## The rangedData object I created in the beginning also has no
rownames.
>
> ## Here is the head(seqtRNAsmall) before I forced the rownames to be
1:86:
>
>
>
>
>
>
>> head(seqtRNAsmall)
> RangedData with 6 rows and 4 value columns across 16 spaces
> space ranges | upstream downstream
> <character> <iranges> | <numeric> <numeric>
> 1 chr1 [166355, 166375] | 100 100
> 2 chr10 [354302, 354314] | 100 100
> 3 chr10 [378397, 378437] | 100 100
> 4 chr10 [422758, 422974] | 100 100
> 5 chr10 [540627, 540651] | 100 100
> 6 chr10 [617933, 618064] | 100 100
>
> sequence
>
> <character>
> 1
> TTGTAATGAGTTGGGCACATGGCGCAGTTGGTAGCGCGCTTCCCTTGCAAGGAAGAGGTCATCGGTTC
GATTCCGGTT
> GCGTCCAAATTTTTTTGTTAATCCAACACAATTGAATTCGTGAATAGCTGACTGTCATCAGTAATGTT
CGTGGAAAGT
> ACCTATCCATACTGTTGTATCACGACTAAGTAGTTGTCGACTACTACCTCCTCAACCCCAGTTAT
> 2
> AATTGTTGGAATTCCATTTTTGATAAAGATGAAAATTGGTGAATTTTGAGATAATTGTTGGGATTCCA
TTGTTGATAA
> AGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATCTAGGAATCCTCATAATG
GAACCTATAT
> TTCTATATACTAATATTCCGTTTTATTCCTTATTCCGTTTTATATGTTTCATTATCC
> 3
> GCTCACAGGGTCCTGCCGGGTATTCGCCCTCCAGCTTGAAAGGGGTGGCAATGCCCAAGAGCGATGTA
TAGTTTAGTA
> TCAAGGAATTGAGAGAGGTATTACACCAAGGATTTCCGTGAACGAAGTGACGACGCAAGATTAACAGA
TTAAATTCTG
> GTAGTAGGCTAAGCCGTTATTAAACTTTATTTACGGTATACCACAATACTGCTCTTTTTGTTGAGGAT
ATGACAGACA
> AAGGCAA
> 4
> CAGTCTCATAATCTGAAGGTCGAGAGTTCGAACCTCCCCTGGAGCACATTTTCTTTTTTTGTGAATTA
AACCTCGAAG
> CTCAACATTCATGTTTTGAGATAAAAAAAGGTACTTATTACTTGTGTAAAAGAAAGTTTATAATGATG
ATCTTATTTA
> AGTTTATAGTAGGACTTTCTATGCATGTCTTGTAGAACTATTTATGCTGGACTCGCCATTAATTAGCT
CCAGTGAAAG
> TACATATCTTCTTTCTTGGACATTATTATCTCTAGTAATAGTGATATAAGGATCGGCATTCAACAGGT
GATCTGGTAT
> TTGCGGCAATCCAGATTCTGGTCCTGCGTTTTTTTTCAGTTCCATAAACCTTTCAACGCCAATCGGCA
GCAAATCAGC
> CGGTAAATGACGTTGTAGATCGCACAT
> 5
> GTGTTAGACGATGACATAAGATACGAGGAACTGTCATCGAAGTTAGAGGAAGCTGAAATGCAAGGATT
GATAATGTAA
> TAGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATATAGAGATAAAGATTC
CATTTTGAGG
>
ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTGATATTATAGCCTTTACCAACAAT
> 6
> TAGCAAAAGTCTTTTCACAATAAGAGGATATTTTGCAGGCAAATTTCTGAAGACCACATAATATCGTC
TTCTTGGCGC
> AACTAAGACGCAGAGAAATGCTTGTTTTCAAAAGGGGTATTCACGTAGTACCTAAACTACCAAATTCT
AAGGCCCTAC
> TGCAAAATGGAGTGCCTAATATACTCAGTTCTTCGGGATTCAAGACGGTGTGGTTCGACTACCAGCGA
TATCTATGTG
> ATAAGCTAACGCTGGCGACAGCGGGACAGTCTTTGGAATCATATTATCCATTTCACATATTGCTGAAA
ACTGCAGGAA
> ATCCATTACAGTCAAACATC
> strand
> <character>
> 1 1
> 2 1
> 3 1
> 4 1
> 5 1
> 6 1
>
>
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] BSgenome.Scerevisiae.UCSC.sacCer1_1.3.16 org.Sc.sgd.db_2.4.1
> rtracklayer_1.8.1 RCurl_1.3-1
> [5] bitops_1.0-4.1 ChIPpeakAnno_1.4.0
> limma_3.4.0 org.Hs.eg.db_2.4.1
> [9] GO.db_2.4.1 RSQLite_0.8-4
> DBI_0.2-5 AnnotationDbi_1.10.0
> [13] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0
> GenomicRanges_1.0.1 Biostrings_2.16.0
> [17] IRanges_1.6.0 multtest_2.4.0
> Biobase_2.8.0 biomaRt_2.4.0
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-5 splines_2.11.0 survival_2.35-8 tools_2.11.0
XML_2.8-1