overlap GRangesList and vcf
1
1
Entering edit mode
Georg Otto ▴ 120
@georg-otto-4188
Last seen 10.4 years ago
Hi, I apologise if this is trivial, but I haven't found a straight forward way to do this so far. Given a GRangesList with exons ## exons by gene range.exon <- exonsBy(txdb.ensgene, "gene") and a vcf file with SNP data (positions, alleles, etc), how can I generate a list of exons that contain SNPs (along with SNP positions)? Any hint will be appreciated! Georg
SNP SNP • 2.3k views
ADD COMMENT
1
Entering edit mode
@valerie-obenchain-4275
Last seen 3.0 years ago
United States
Hi Georg, The readVcf() function in the VariantAnnotation package reads data from a VCF file into a VCF-class object, library(VariantAnnotation) fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") vcf <- readVcf(fl, "hg19") > vcf class: VCF dim: 10376 5 genome: hg19 exptData(1): header fixed(4): REF ALT QUAL FILTER info(22): LDAF AVGPOST ... VT SNPSOURCE geno(3): GT DS GL rownames(10376): rs7410291 rs147922003 ... rs144055359 rs114526001 rowData values names(1): paramRangeID colnames(5): HG00096 HG00097 HG00099 HG00100 HG00101 colData names(1): Samples For details of the class slots and data accessors, ?'VCF-class' Before counting, make sure the chromosome names are consistent between the vcf and annotation. Here the annotation precedes chromosome numbers with 'chr' and the vcf file does not. > intersect(seqlevels(txdb), seqlevels(vcf)) character(0) We modify the vcf seqlevels to match the txdb using renameSeqlevels(), old <- seqlevels(vcf) new <- paste("chr", old, sep="") names(new) <- old vcf <- renameSeqlevels(vcf, new) > intersect(seqlevels(txdb), seqlevels(vcf)) [1] "chr22" The rowData slot contains a GRanges of positions which can be overlapped with your exons by genes, rd <- rowData(vcf) You wanted a list of exons hit by variants so we will unlist the GRangesList. If instead you want the hits grouped by gene, don't unlist(exbygn). library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exbygn <- exonsBy(txdb, "gene") exons <- unlist(exbygn) Counting can be done with findOverlaps() or summarizeOverlaps(). See the man pages for details on how the counting is performed. summarizedOverlaps() is written with read gaps in mind and therefore requires the 'reads' argument to be a Bam file or GappedAlignements object. Since you are interested in SNPs for this example we'll use findOverlaps(). fo <- findOverlaps(exons, rd) Exons hit by variants are extracted from 'exons' with the 'queryHits' column, exonsHit <- exons[queryHits(fo)] and the corresponding variants are subset from the VCF-class object with the 'subjectHits' column. vcfHit <- vcf[subjectHits(fo)] The positions of the SNPs are in the GRanges in the rowData slot, rowData(vcfHit) If you wanted your hits grouped by gene, don't unlist(exbygn) before counting. Valerie On 08/10/12 09:55, Georg Otto wrote: > Hi, > > > I apologise if this is trivial, but I haven't found a straight forward > way to do this so far. > > Given a GRangesList with exons > > ## exons by gene > range.exon<- exonsBy(txdb.ensgene, "gene") > > and a vcf file with SNP data (positions, alleles, etc), how can I > generate a list of exons that contain SNPs (along with SNP positions)? > > Any hint will be appreciated! > > Georg > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6