SNP Flanking Seq
1
1
Entering edit mode
prodhan82 ▴ 20
@prodhan82-21410
Last seen 3.0 years ago
Australia

Two queries: i. What is the package in Bioconductor where the position of a SNP can be specified to get its flanking sequences? ii. What is the way to highlight or put brackets around a nucleotide at a specific position in a DNA String Set? Thanks in advance. Prodhan

annotation • 1.6k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 9 weeks ago
United States

For the first question, you will use the GenomicRanges package and a package or other source with relevant reference genome, e.g.,

library(GenomicRanges)
library("BSgenome.Hsapiens.UCSC.hg38")

Create or input (e.g., with VariantExperiment::readVCF() or rtracklayer::import()) a GRanges object representing the SNPs

snps <- GRanges(
    c("chr1:1000000", "chr2:20000000", "chr3:30000000")
)

Use flank() to define the regions of interest

flanks <- flank(snps, 100)

Retrieve the flanking sequences from the relevant genome

getSeq(BSgenome.Hsapiens.UCSC.hg38, flanks)

Packages with whole-genome sequences start with "BSgenome", so use

BiocManager::available("BSgenome")

to see available genomes, followed by BiocManager::install(full_package_name) to install it.

Alternatively, search the AnnotationHub for whole genome sequences, especially from Ensembl

> library(AnnotationHub)
> hub = AnnotationHub()
> query(hub, c("ailMel1", "2bit", "release-96"))
AnnotationHub with 4 records
# snapshotDate(): 2019-07-10
# $dataprovider: Ensembl
# $species: Ailuropoda melanoleuca
# $rdataclass: TwoBitFile
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH69768"]]'

            title
  AH69768 | Ailuropoda_melanoleuca.ailMel1.cdna.all.2bit
  AH69769 | Ailuropoda_melanoleuca.ailMel1.dna_rm.toplevel.2bit
  AH69770 | Ailuropoda_melanoleuca.ailMel1.dna_sm.toplevel.2bit
  AH69771 | Ailuropoda_melanoleuca.ailMel1.ncrna.2bit

and use that

seq = hub[["AH69769"]]
getSeq(seq, flanks)
ADD COMMENT

Login before adding your answer.

Traffic: 821 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6