Hello, Is it possible to get the sequences between restriction sites from a .bed file containing the coordinates of fragments generated from a in silico RE digestion of mouse mm9 genome in Bioconductor? If so, how can I do it?
I want a fasta file containing all the restriction fragments for alignment of 4c-seq reads in a short read aligner.
Load the genomic coordinates of your fragments with rtracklayer::import(), use the object returned by rtracklayer::import() to extract the corresponding DNA sequences from the BSgenome object for mm9, then write the sequences to a FASTA file with Biostrings::writeXStringSet(). Will look something like:
library(rtracklayer) # for import()
library(BSgenome) # for getSeq()
library(BSgenome.Mmusculus.UCSC.mm9)
library(Biostrings) # for writeXStringSet()
restriction_sites <- import("path/to/restriction_sites.bed")
restriction_fragments <- getSeq(BSgenome.Mmusculus.UCSC.mm9, restriction_sites)
writeXStringSet(restriction_fragments, "restriction_fragments.fa")
Thank you very much!