find gene symbols immediately flanking before and after a SNP position
1
0
Entering edit mode
@adaikalavan-ramasamy-5765
Last seen 10.1 years ago
United Kingdom
Dear all, I have a list of several hundred SNP that I would like to annotate functionally and am able to do this via websites such as SeattleSeq. However, for intergenic SNPs it does not give me the neighbouring genes. Therefore, I have tried to find genes immediately flanking a SNP (one left and right) in R. I note that this question has been asked previously. I am trying to follow one of the previous suggestions (https://stat.ethz.ch/pipermail/bioconductor/2010-December /037185.html). I been struggling with this for the last two days but I think I am getting something fundamentally wrong. I have chosen the following two SNPs (among several thousands). I am expecting to see the following kind of output: rs881375 (chr9:123652898) is located between PHF19 and TRAF1 rs12191877 (chr6:31252925) is located between RPL3P2 and WASF5P First, I code the query up as a GRange object: rsid <- c("rs881375", "rs12191877") chr <- c("chr9", "chr6") pos <- c(123652898, 31252925) library(GenomicFeatures) target <- GRanges( seqnames = Rle( chr ), ranges = IRanges(pos, end=pos, names=rsid), strand = Rle(strand( rep("*", length(chr)) )) ) # GRanges with 2 ranges and 0 metadata columns: # seqnames ranges strand # <rle> <iranges> <rle> # rs881375 chr9 [123652898, 123652898] * # rs12191877 chr6 [ 31252925, 31252925] * # --- # seqlengths: # chr6 chr9 # NA NA txdb <- makeTranscriptDbFromUCSC("hg19") # about 5 min tx <- transcriptsBy(txdb) But when I try precede( target, tx ) follow( target, tx ) I get the following message: Error in function (classes, fdef, mtable) : unable to find an inherited method for function ?precede? for signature ?"GRanges", "GRangesList"? Any help would be very much appreciated. I am happy to try other packages or websites if available. Many thanks. Regards, Adai
SNP SNP • 1.9k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Adai, When using precede() and follow() the 'x' and 'subject' arguments can be any of the following combinations, > showMethods("precede") Function: precede (package IRanges) x="ANY", subject="SummarizedExperiment" x="GenomicRanges", subject="GenomicRanges" x="GenomicRanges", subject="missing" x="Ranges", subject="RangesORmissing" x="SummarizedExperiment", subject="ANY" x="SummarizedExperiment", subject="SummarizedExperiment" The function transcriptsBy() returns a GRangesList. Instead try using the transcripts() function which will return a GRanges, tx <- transcripts(txdb) Another function worth exploring is locateVariants() in the VariantAnnotation package. See the examples on the ?locateVariants man page to make sure the seqnames (chromosome names) in your data and the txdb match. You can try using the AllVariants() region loc <- locateVariants(query, subject, AllVariants()) or IntergenicVariants() if you are sure the snp is intergenic. loc <- locateVariants(query, subject, IntergenicVariants()) In these examples, 'query' is a GRanges of your data and 'subject' is the txdb you made from UCSC. Valerie On 02/13/2013 02:38 PM, Adaikalavan Ramasamy wrote: > Dear all, > > I have a list of several hundred SNP that I would like to annotate > functionally and am able to do this via websites such as SeattleSeq. > However, for intergenic SNPs it does not give me the neighbouring > genes. Therefore, I have tried to find genes immediately flanking a > SNP (one left and right) in R. I note that this question has been > asked previously. I am trying to follow one of the previous > suggestions (https://stat.ethz.ch/pipermail/bioconductor/2010-Decemb er/037185.html). > I been struggling with this for the last two days but I think I am > getting something fundamentally wrong. > > I have chosen the following two SNPs (among several thousands). I am > expecting to see the following kind of output: > rs881375 (chr9:123652898) is located between PHF19 and TRAF1 > rs12191877 (chr6:31252925) is located between RPL3P2 and WASF5P > > > First, I code the query up as a GRange object: > > rsid <- c("rs881375", "rs12191877") > chr <- c("chr9", "chr6") > pos <- c(123652898, 31252925) > > library(GenomicFeatures) > > target <- GRanges( > seqnames = Rle( chr ), > ranges = IRanges(pos, end=pos, names=rsid), > strand = Rle(strand( rep("*", length(chr)) )) > ) > > # GRanges with 2 ranges and 0 metadata columns: > # seqnames ranges strand > # <rle> <iranges> <rle> > # rs881375 chr9 [123652898, 123652898] * > # rs12191877 chr6 [ 31252925, 31252925] * > # --- > # seqlengths: > # chr6 chr9 > # NA NA > > txdb <- makeTranscriptDbFromUCSC("hg19") # about 5 min > tx <- transcriptsBy(txdb) > > > But when I try > precede( target, tx ) > follow( target, tx ) > > I get the following message: > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function ?precede? for > signature ?"GRanges", "GRangesList"? > > Any help would be very much appreciated. I am happy to try other > packages or websites if available. Many thanks. > > Regards, Adai > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Dear Valerie, Apologies for the delay. Your suggestions solves my problem perfectly. Thank you for your help and also for writing the package. I have blogged on how to do this at http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps- and-indels-in-bioconductor/ And here are R codes for posterity and if anyone is interested. library(VariantAnnotation) library(TxDb.Hsapiens.UCSC.hg19.knownGene) ######################### ## 1. Prepare the query file ## ######################### input <- rbind.data.frame( c("rs3753344", "chr1", 1142150), c("rs12191877", "chr6", 31252925), c("rs881375", "chr9", 123652898) ) colnames(input) <- c("rsid", "chr", "pos") input$pos <- as.numeric(as.character(input$pos)) input # rsid chr pos # 1 rs3753344 chr1 1142150 # 2 rs12191877 chr6 31252925 # 3 rs881375 chr9 123652898 target <- with(input, GRanges( seqnames = Rle(chr), ranges = IRanges(pos, end=pos, names=rsid), strand = Rle(strand("*")) ) ) ############### ## 2. Annotate ## ############### loc <- locateVariants(target, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants()) names(loc) <- NULL out <- as.data.frame(loc) out$names <- names(target)[ out$QUERYID ] out <- out[ , c("names", "seqnames", "start", "end", "LOCATION", "GENEID", "PRECEDEID", "FOLLOWID")] out <- unique(out) out # names seqnames start end LOCATION GENEID PRECEDEID FOLLOWID # 1 rs3753344 chr1 1142150 1142150 promoter 8784 <na> <na> # 5 rs3753344 chr1 1142150 1142150 intergenic <na> 8784 7293 # 6 rs12191877 chr6 31252925 31252925 intron 3106 <na> <na> # 7 rs881375 chr9 123652898 123652898 intergenic <na> 26147 7185 ############################################## ## 3. Convert from Entrez Gene ID to Gene Symbols ## ############################################## library(org.Hs.eg.db) Symbol2id <- as.list( org.Hs.egSYMBOL2EG ) id2Symbol <- rep( names(Symbol2id), sapply(Symbol2id, length) ) names(id2Symbol) <- unlist(Symbol2id) x <- unique( with(out, c(levels(GENEID), levels(PRECEDEID), levels(FOLLOWID))) ) table( x %in% names(id2Symbol) ) # good, all found out$GENESYMBOL <- id2Symbol[ as.character(out$GENEID) ] out$PRECEDESYMBOL <- id2Symbol[ as.character(out$PRECEDEID) ] out$FOLLOWSYMBOL <- id2Symbol[ as.character(out$FOLLOWID) ] out # names seqnames start end LOCATION # 1 rs3753344 chr1 1142150 1142150 promoter # 5 rs3753344 chr1 1142150 1142150 intergenic # 6 rs12191877 chr6 31252925 31252925 intron # 7 rs881375 chr9 123652898 123652898 intergenic # # GENEID PRECEDEID FOLLOWID GENESYMBOL PRECEDESYMBOL FOLLOWSYMBOL # 1 8784 <na> <na> TNFRSF18 <na> <na> # 5 <na> 8784 7293 <na> TNFRSF18 TNFRSF4 # 6 3106 <na> <na> HLA-B <na> <na> # 7 <na> 26147 7185 <na> PHF19 TRAF1 Many thanks. I hope this helps someone in the future! Regards, Adai On Wed, Feb 13, 2013 at 11:53 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > > Hi Adai, > > When using precede() and follow() the 'x' and 'subject' arguments can be any of the following combinations, > > > showMethods("precede") > Function: precede (package IRanges) > x="ANY", subject="SummarizedExperiment" > x="GenomicRanges", subject="GenomicRanges" > x="GenomicRanges", subject="missing" > x="Ranges", subject="RangesORmissing" > x="SummarizedExperiment", subject="ANY" > x="SummarizedExperiment", subject="SummarizedExperiment" > > > The function transcriptsBy() returns a GRangesList. Instead try using the transcripts() function which will return a GRanges, > > tx <- transcripts(txdb) > > > Another function worth exploring is locateVariants() in the VariantAnnotation package. See the examples on the ?locateVariants man page to make sure the seqnames (chromosome names) in your data and the txdb match. You can try using the AllVariants() region > > loc <- locateVariants(query, subject, AllVariants()) > > or IntergenicVariants() if you are sure the snp is intergenic. > > loc <- locateVariants(query, subject, IntergenicVariants()) > > In these examples, 'query' is a GRanges of your data and 'subject' is the txdb you made from UCSC. > > Valerie > > > > > > > On 02/13/2013 02:38 PM, Adaikalavan Ramasamy wrote: >> >> Dear all, >> >> I have a list of several hundred SNP that I would like to annotate >> functionally and am able to do this via websites such as SeattleSeq. >> However, for intergenic SNPs it does not give me the neighbouring >> genes. Therefore, I have tried to find genes immediately flanking a >> SNP (one left and right) in R. I note that this question has been >> asked previously. I am trying to follow one of the previous >> suggestions (https://stat.ethz.ch/pipermail/bioconductor/2010-Decem ber/037185.html). >> I been struggling with this for the last two days but I think I am >> getting something fundamentally wrong. >> >> I have chosen the following two SNPs (among several thousands). I am >> expecting to see the following kind of output: >> rs881375 (chr9:123652898) is located between PHF19 and TRAF1 >> rs12191877 (chr6:31252925) is located between RPL3P2 and WASF5P >> >> >> First, I code the query up as a GRange object: >> >> rsid <- c("rs881375", "rs12191877") >> chr <- c("chr9", "chr6") >> pos <- c(123652898, 31252925) >> >> library(GenomicFeatures) >> >> target <- GRanges( >> seqnames = Rle( chr ), >> ranges = IRanges(pos, end=pos, names=rsid), >> strand = Rle(strand( rep("*", length(chr)) )) >> ) >> >> # GRanges with 2 ranges and 0 metadata columns: >> # seqnames ranges strand >> # <rle> <iranges> <rle> >> # rs881375 chr9 [123652898, 123652898] * >> # rs12191877 chr6 [ 31252925, 31252925] * >> # --- >> # seqlengths: >> # chr6 chr9 >> # NA NA >> >> txdb <- makeTranscriptDbFromUCSC("hg19") # about 5 min >> tx <- transcriptsBy(txdb) >> >> >> But when I try >> precede( target, tx ) >> follow( target, tx ) >> >> I get the following message: >> Error in function (classes, fdef, mtable) : >> unable to find an inherited method for function ?precede? for >> signature ?"GRanges", "GRangesList"? >> >> Any help would be very much appreciated. I am happy to try other >> packages or websites if available. Many thanks. >> >> Regards, Adai >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD REPLY

Login before adding your answer.

Traffic: 589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6