get chr position for a batch of human SNPs
3
0
Entering edit mode
shirley zhang ★ 1.0k
@shirley-zhang-2038
Last seen 10.2 years ago
Dear All, I am planing to map the SNPids to hg18 positions (chr and position) for a huge list of human snps. I've tried the package "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this package: 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 with Group Label "reference" instead of "Celera" or "HuRef"? 2. If I don't know the chr information (seqname), can I obtain the position with dbSNP Id only? Further, I find dbSNP batch queries a little more difficult to work with because they map to different versions of the hg18 like Celera, HumanRef, etc.Can anybody let me know a better option to get hg18 chr position with the most popular or confident version of dbSNP? Thanks in advance
• 3.2k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Shirley, On 11-09-22 11:29 AM, shirley zhang wrote: > Dear All, > > I am planing to map the SNPids to hg18 positions (chr and position) > for a huge list of human snps. I've tried the package > "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this > package: > > 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 > with Group Label "reference" instead of "Celera" or "HuRef"? Yes, they are mapped to hg18. See: http://bioconductor.org/packages/release/data/annotation/html/SNPlocs. Hsapiens.dbSNP.20090506.html and the man page of the package for additional details: > library(SNPlocs.Hsapiens.dbSNP.20090506) > ?SNPlocs.Hsapiens.dbSNP.20090506 > > 2. If I don't know the chr information (seqname), can I obtain the > position with dbSNP Id only? Unfortunately, because SNPs are stored in one data frame per chromosome, if you don't know the chr then you need to load and query each data frame individually. With more recent SNPlocs packages (e.g. SNPlocs.Hsapiens.dbSNP.20100427), provision was added to let the user load SNPs from more than 1 chromosome in a single GRanges object, so you can do something like: ## Load all the SNPs in a big GRanges object (takes about ## 13 minutes and requires 6GB of RAM!): all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) ## Use the rs ids to set the names (takes about 6 minutes): names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id, sep="") ## Then extract your SNPs from the big GRanges object (again, ## this can take a long time, depending on how many SNPs you ## extract): my_rs_ids <- sample(names(all_snps), 1000) my_snps <- all_snps[my_rs_ids] However, please note that, starting with SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. Hope this helps, H. > > Further, I find dbSNP batch queries a little more difficult to work > with because they map to different versions of the hg18 like Celera, > HumanRef, etc.Can anybody let me know a better option to get hg18 chr > position with the most popular or confident version of dbSNP? > > Thanks in advance > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Dear Herve, Thanks for your quick reply. I've checked the link you suggested. The SNPs in "SNPlocs.Hsapiens.dbSNP.20090506" do map the hg18 genome (NCBI Build 36.1). But when I manually check the hg18 positions for a few SNPs on dbSNP at NCBI. I can only get position information for NCBI Build 36.3 and NCBI Build 37. What is the difference between NCBI Build 36.3 and NCBI Build 36.1? Are they basically the same? Further, in dbSNP at NCBI, even for NCBI Build 36.3, there are 3 versions of the hg18 position per SNP (each version has a different Group Label, i.e. "reference", "Celera" or "HuRef"). Could you kindly tell me what's the difference among these 3 versions? For my 2nd question, getting SNP position for SNPs without knowing the chr information, since I need to map SNPids to hg18 positions very often in my research, after all SNPs from all of chromosomes were retrieved using "SNPlocs.Hsapiens.dbSNP.20090506", I combined them into one big data frame and saved it on the server. But later on if I need to map SNPids to hg19, I will follow your suggestions by creating a big GRanges object using more recent SNPlocs packages. Thanks a lot for your detailed explanation. It is very helpful. Shirley 2011/9/22 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Shirley, > > On 11-09-22 11:29 AM, shirley zhang wrote: >> >> Dear All, >> >> I am planing to map the SNPids to hg18 positions (chr and position) >> for a huge list of human snps. I've tried the package >> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >> package: >> >> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >> with Group Label "reference" instead of "Celera" or "HuRef"? > > Yes, they are mapped to hg18. See: > > > http://bioconductor.org/packages/release/data/annotation/html/SNPloc s.Hsapiens.dbSNP.20090506.html > > and the man page of the package for additional details: > > ?> library(SNPlocs.Hsapiens.dbSNP.20090506) > ?> ?SNPlocs.Hsapiens.dbSNP.20090506 > >> >> 2. If I don't know the chr information (seqname), can I obtain the >> position with dbSNP Id only? > > Unfortunately, because SNPs are stored in one data frame per > chromosome, if you don't know the chr then you need to load and > query each data frame individually. > > With more recent SNPlocs packages (e.g. > SNPlocs.Hsapiens.dbSNP.20100427), provision was added to > let the user load SNPs from more than 1 chromosome in a single > GRanges object, so you can do something like: > > ?## Load all the SNPs in a big GRanges object (takes about > ?## 13 minutes and requires 6GB of RAM!): > ?all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) > > ?## Use the rs ids to set the names (takes about 6 minutes): > ?names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id, > ? ? ? ? ? ? ? ? ? ? ? ? ? sep="") > > ?## Then extract your SNPs from the big GRanges object (again, > ?## this can take a long time, depending on how many SNPs you > ?## extract): > ?my_rs_ids <- sample(names(all_snps), 1000) > ?my_snps <- all_snps[my_rs_ids] > > However, please note that, starting with > SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), > SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. > > Hope this helps, > H. > >> >> Further, I find dbSNP batch queries a little more difficult to work >> with because they map to different versions of the hg18 like Celera, >> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >> position with the most popular or confident version of dbSNP? >> >> Thanks in advance >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 >
ADD REPLY
0
Entering edit mode
Hi Shirley, The following page explains the difference between Build 36.1 and Build 36.3: http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html In short, the *reference* genome assembly is the same for builds 36.1, 36.2 and 36.3. In those 3 builds, the reference assembly corresponds to UCSC hg18. Note that what NCBI calls a "build" can contain other assemblies in addition to the reference assembly. For example builds 36.1 and 36.2 contain only the reference assembly but Build 36.3 also contains the HuRef and Celera assemblies. All the SNP positions relative to the HuRef or Celera assemblies were ignored when the SNPlocs.Hsapiens.dbSNP.20090506 package was made. Only the positions relative to the "reference assembly" were used (i.e. positions relative to hg18). I've tried to refine a little bit the SNPlocs packages over the time, and, in the more recent ones (starting with SNPlocs.Hsapiens.dbSNP.20100427), the man page provides some details on how the SNPs were curated before inclusion in the package. In particular SNPs mapped to more than 1 location on the reference genome were dropped. Hope this clarifies, H. On 11-09-22 05:48 PM, shirley zhang wrote: > Dear Herve, > > Thanks for your quick reply. > > I've checked the link you suggested. The SNPs in > "SNPlocs.Hsapiens.dbSNP.20090506" do map the hg18 genome (NCBI Build > 36.1). But when I manually check the hg18 positions for a few SNPs on > dbSNP at NCBI. I can only get position information for NCBI Build 36.3 > and NCBI Build 37. What is the difference between NCBI Build 36.3 and > NCBI Build 36.1? Are they basically the same? Further, in dbSNP at > NCBI, even for NCBI Build 36.3, there are 3 versions of the hg18 > position per SNP (each version has a different Group Label, i.e. > "reference", "Celera" or "HuRef"). Could you kindly tell me what's the > difference among these 3 versions? > > For my 2nd question, getting SNP position for SNPs without knowing the > chr information, since I need to map SNPids to hg18 positions very > often in my research, after all SNPs from all of chromosomes were > retrieved using "SNPlocs.Hsapiens.dbSNP.20090506", I combined them > into one big data frame and saved it on the server. But later on if I > need to map SNPids to hg19, I will follow your suggestions by creating > a big GRanges object using more recent SNPlocs packages. > > Thanks a lot for your detailed explanation. It is very helpful. > > Shirley > > 2011/9/22 Hervé Pagès<hpages at="" fhcrc.org="">: >> Hi Shirley, >> >> On 11-09-22 11:29 AM, shirley zhang wrote: >>> >>> Dear All, >>> >>> I am planing to map the SNPids to hg18 positions (chr and position) >>> for a huge list of human snps. I've tried the package >>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >>> package: >>> >>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >>> with Group Label "reference" instead of "Celera" or "HuRef"? >> >> Yes, they are mapped to hg18. See: >> >> >> http://bioconductor.org/packages/release/data/annotation/html/SNPlo cs.Hsapiens.dbSNP.20090506.html >> >> and the man page of the package for additional details: >> >> > library(SNPlocs.Hsapiens.dbSNP.20090506) >> > ?SNPlocs.Hsapiens.dbSNP.20090506 >> >>> >>> 2. If I don't know the chr information (seqname), can I obtain the >>> position with dbSNP Id only? >> >> Unfortunately, because SNPs are stored in one data frame per >> chromosome, if you don't know the chr then you need to load and >> query each data frame individually. >> >> With more recent SNPlocs packages (e.g. >> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to >> let the user load SNPs from more than 1 chromosome in a single >> GRanges object, so you can do something like: >> >> ## Load all the SNPs in a big GRanges object (takes about >> ## 13 minutes and requires 6GB of RAM!): >> all_snps<- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) >> >> ## Use the rs ids to set the names (takes about 6 minutes): >> names(all_snps)<- paste("rs", elementMetadata(all_snps)$RefSNP_id, >> sep="") >> >> ## Then extract your SNPs from the big GRanges object (again, >> ## this can take a long time, depending on how many SNPs you >> ## extract): >> my_rs_ids<- sample(names(all_snps), 1000) >> my_snps<- all_snps[my_rs_ids] >> >> However, please note that, starting with >> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), >> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. >> >> Hope this helps, >> H. >> >>> >>> Further, I find dbSNP batch queries a little more difficult to work >>> with because they map to different versions of the hg18 like Celera, >>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >>> position with the most popular or confident version of dbSNP? >>> >>> Thanks in advance >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Dear Herve, Thank you so much for your explanation. It is extremely helpful for me to understand the difference between different versions of dbSNP database. I will also read the main page of the most recent SNPlocs packages, such as SNPlocs.Hsapiens.dbSNP.20100427 to get more idea of how the SNPs were curated. Thanks again for your information. Shirley 2011/9/23 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Shirley, > > The following page explains the difference between Build 36.1 > and Build 36.3: > > ?http://www.ncbi.nlm.nih.gov/genome/guide/human/release_notes.html > > In short, the *reference* genome assembly is the same for builds 36.1, > 36.2 and 36.3. In those 3 builds, the reference assembly corresponds > to UCSC hg18. > > Note that what NCBI calls a "build" can contain other assemblies in > addition to the reference assembly. For example builds 36.1 and 36.2 > contain only the reference assembly but Build 36.3 also contains > the HuRef and Celera assemblies. > > All the SNP positions relative to the HuRef or Celera assemblies were > ignored when the SNPlocs.Hsapiens.dbSNP.20090506 package was made. > Only the positions relative to the "reference assembly" were used > (i.e. positions relative to hg18). > > I've tried to refine a little bit the SNPlocs packages over the > time, and, in the more recent ones (starting with > SNPlocs.Hsapiens.dbSNP.20100427), the man page provides some > details on how the SNPs were curated before inclusion in the > package. In particular SNPs mapped to more than 1 location on > the reference genome were dropped. > > Hope this clarifies, > > H. > > > On 11-09-22 05:48 PM, shirley zhang wrote: >> >> Dear Herve, >> >> Thanks for your quick reply. >> >> I've checked the link you suggested. The SNPs in >> "SNPlocs.Hsapiens.dbSNP.20090506" do map the hg18 genome (NCBI Build >> 36.1). But when I manually check the hg18 positions for a few SNPs on >> dbSNP at NCBI. I can only get position information for NCBI Build 36.3 >> and NCBI Build 37. What is the difference between NCBI Build 36.3 and >> NCBI Build 36.1? Are they basically the same? Further, in dbSNP at >> NCBI, even for NCBI Build 36.3, there are 3 versions of the hg18 >> position per SNP (each version has a different Group Label, i.e. >> "reference", "Celera" or "HuRef"). Could you kindly tell me what's the >> difference among these 3 versions? >> >> For my 2nd question, getting SNP position for SNPs without knowing the >> chr information, since I need to map SNPids to hg18 positions very >> often in my research, after all SNPs from all of chromosomes ?were >> retrieved using "SNPlocs.Hsapiens.dbSNP.20090506", ?I combined them >> into one big data frame and saved it on the server. But later on if I >> need to map SNPids to hg19, I will follow your suggestions by creating >> a big GRanges object using more recent SNPlocs packages. >> >> Thanks a lot for your detailed explanation. It is very helpful. >> >> Shirley >> >> 2011/9/22 Hervé Pagès<hpages at="" fhcrc.org="">: >>> >>> Hi Shirley, >>> >>> On 11-09-22 11:29 AM, shirley zhang wrote: >>>> >>>> Dear All, >>>> >>>> I am planing to map the SNPids to hg18 positions (chr and position) >>>> for a huge list of human snps. I've tried the package >>>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >>>> package: >>>> >>>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >>>> with Group Label "reference" instead of "Celera" or "HuRef"? >>> >>> Yes, they are mapped to hg18. See: >>> >>> >>> >>> http://bioconductor.org/packages/release/data/annotation/html/SNPl ocs.Hsapiens.dbSNP.20090506.html >>> >>> and the man page of the package for additional details: >>> >>> ?> ?library(SNPlocs.Hsapiens.dbSNP.20090506) >>> ?> ??SNPlocs.Hsapiens.dbSNP.20090506 >>> >>>> >>>> 2. If I don't know the chr information (seqname), can I obtain the >>>> position with dbSNP Id only? >>> >>> Unfortunately, because SNPs are stored in one data frame per >>> chromosome, if you don't know the chr then you need to load and >>> query each data frame individually. >>> >>> With more recent SNPlocs packages (e.g. >>> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to >>> let the user load SNPs from more than 1 chromosome in a single >>> GRanges object, so you can do something like: >>> >>> ?## Load all the SNPs in a big GRanges object (takes about >>> ?## 13 minutes and requires 6GB of RAM!): >>> ?all_snps<- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) >>> >>> ?## Use the rs ids to set the names (takes about 6 minutes): >>> ?names(all_snps)<- paste("rs", elementMetadata(all_snps)$RefSNP_id, >>> ? ? ? ? ? ? ? ? ? ? ? ? ? sep="") >>> >>> ?## Then extract your SNPs from the big GRanges object (again, >>> ?## this can take a long time, depending on how many SNPs you >>> ?## extract): >>> ?my_rs_ids<- sample(names(all_snps), 1000) >>> ?my_snps<- all_snps[my_rs_ids] >>> >>> However, please note that, starting with >>> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), >>> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. >>> >>> Hope this helps, >>> H. >>> >>>> >>>> Further, I find dbSNP batch queries a little more difficult to work >>>> with because they map to different versions of the hg18 like Celera, >>>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >>>> position with the most popular or confident version of dbSNP? >>>> >>>> Thanks in advance >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: ?(206) 667-5791 >>> Fax: ? ?(206) 667-1319 >>> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > -- Xiaoling (Shirley) Zhang M.D., Ph.D. (Bioinformatics) Boston University, Boston, MA Tel: (857) 233-9862 Email: zhangxl at bu.edu
ADD REPLY
0
Entering edit mode
Hi Shirley, On 11-09-22 02:24 PM, Hervé Pagès wrote: > Hi Shirley, > > On 11-09-22 11:29 AM, shirley zhang wrote: >> Dear All, >> >> I am planing to map the SNPids to hg18 positions (chr and position) >> for a huge list of human snps. I've tried the package >> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >> package: >> >> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >> with Group Label "reference" instead of "Celera" or "HuRef"? > > Yes, they are mapped to hg18. See: > > > http://bioconductor.org/packages/release/data/annotation/html/SNPloc s.Hsapiens.dbSNP.20090506.html > > > and the man page of the package for additional details: > > > library(SNPlocs.Hsapiens.dbSNP.20090506) > > ?SNPlocs.Hsapiens.dbSNP.20090506 > >> >> 2. If I don't know the chr information (seqname), can I obtain the >> position with dbSNP Id only? > > Unfortunately, because SNPs are stored in one data frame per > chromosome, if you don't know the chr then you need to load and > query each data frame individually. > > With more recent SNPlocs packages (e.g. > SNPlocs.Hsapiens.dbSNP.20100427), provision was added to > let the user load SNPs from more than 1 chromosome in a single > GRanges object, so you can do something like: > > ## Load all the SNPs in a big GRanges object (takes about > ## 13 minutes and requires 6GB of RAM!): > all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) > > ## Use the rs ids to set the names (takes about 6 minutes): > names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id, > sep="") > > ## Then extract your SNPs from the big GRanges object (again, > ## this can take a long time, depending on how many SNPs you > ## extract): > my_rs_ids <- sample(names(all_snps), 1000) > my_snps <- all_snps[my_rs_ids] > > However, please note that, starting with > SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), > SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. I've made some improvements to the SNPlocs packages. The improved packages are in BioC 2.9 (current devel, soon to be released) and had their versions bumped to 0.99.6. They are: SNPlocs.Hsapiens.dbSNP.20090506: dbSNP Build 130, based on hg18 SNPlocs.Hsapiens.dbSNP.20100427: dbSNP Build 131, based on hg19 SNPlocs.Hsapiens.dbSNP.20101109: dbSNP Build 132, based on hg19 SNPlocs.Hsapiens.dbSNP.20110815: dbSNP Build 134, based on hg19 Only the source packages are available at the moment but they should be installable on Windows/Mac from R-2.14 with biocLite(..., type="source"). The data in those packages have not changed but are organized more efficiently. There is a new function rsidsToGRanges() to extract SNP information for a set of rs ids. Using rsidsToGRanges() is much more memory (1/10) and time efficient (40x) than using the code I provided above. See ?rsidsToGRanges for the details. Let me know if you have questions. Cheers, H. > > Hope this helps, > H. > >> >> Further, I find dbSNP batch queries a little more difficult to work >> with because they map to different versions of the hg18 like Celera, >> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >> position with the most popular or confident version of dbSNP? >> >> Thanks in advance >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
this is terrific, thanks so much Herve --t 2011/9/28 Hervé Pagès <hpages@fhcrc.org> > Hi Shirley, > > > On 11-09-22 02:24 PM, Hervé Pagès wrote: > >> Hi Shirley, >> >> On 11-09-22 11:29 AM, shirley zhang wrote: >> >>> Dear All, >>> >>> I am planing to map the SNPids to hg18 positions (chr and position) >>> for a huge list of human snps. I've tried the package >>> "SNPlocs.Hsapiens.dbSNP.**20090506" and have 2 questions regarding this >>> package: >>> >>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >>> with Group Label "reference" instead of "Celera" or "HuRef"? >>> >> >> Yes, they are mapped to hg18. See: >> >> >> http://bioconductor.org/**packages/release/data/** >> annotation/html/SNPlocs.**Hsapiens.dbSNP.20090506.html<http: bioco="" nductor.org="" packages="" release="" data="" annotation="" html="" snplocs.hsapiens.dbs="" np.20090506.html=""> >> >> >> and the man page of the package for additional details: >> >> > library(SNPlocs.Hsapiens.**dbSNP.20090506) >> > ?SNPlocs.Hsapiens.dbSNP.**20090506 >> >> >>> 2. If I don't know the chr information (seqname), can I obtain the >>> position with dbSNP Id only? >>> >> >> Unfortunately, because SNPs are stored in one data frame per >> chromosome, if you don't know the chr then you need to load and >> query each data frame individually. >> >> With more recent SNPlocs packages (e.g. >> SNPlocs.Hsapiens.dbSNP.**20100427), provision was added to >> let the user load SNPs from more than 1 chromosome in a single >> GRanges object, so you can do something like: >> >> ## Load all the SNPs in a big GRanges object (takes about >> ## 13 minutes and requires 6GB of RAM!): >> all_snps <- getSNPlocs(names(getSNPcount()**), as.GRanges=TRUE) >> >> ## Use the rs ids to set the names (takes about 6 minutes): >> names(all_snps) <- paste("rs", elementMetadata(all_snps)$**RefSNP_id, >> sep="") >> >> ## Then extract your SNPs from the big GRanges object (again, >> ## this can take a long time, depending on how many SNPs you >> ## extract): >> my_rs_ids <- sample(names(all_snps), 1000) >> my_snps <- all_snps[my_rs_ids] >> >> However, please note that, starting with >> SNPlocs.Hsapiens.dbSNP.**20100427 (i.e. dbSNP Build 131), >> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. >> > > I've made some improvements to the SNPlocs packages. The improved > packages are in BioC 2.9 (current devel, soon to be released) and > had their versions bumped to 0.99.6. They are: > > SNPlocs.Hsapiens.dbSNP.**20090506: dbSNP Build 130, based on hg18 > SNPlocs.Hsapiens.dbSNP.**20100427: dbSNP Build 131, based on hg19 > SNPlocs.Hsapiens.dbSNP.**20101109: dbSNP Build 132, based on hg19 > SNPlocs.Hsapiens.dbSNP.**20110815: dbSNP Build 134, based on hg19 > > Only the source packages are available at the moment but they should > be installable on Windows/Mac from R-2.14 with > > biocLite(..., type="source"). > > The data in those packages have not changed but are organized more > efficiently. There is a new function rsidsToGRanges() to extract > SNP information for a set of rs ids. Using rsidsToGRanges() is > much more memory (1/10) and time efficient (40x) than using the > code I provided above. See ?rsidsToGRanges for the details. > > Let me know if you have questions. > > Cheers, > > H. > > >> Hope this helps, >> H. >> >> >>> Further, I find dbSNP batch queries a little more difficult to work >>> with because they map to different versions of the hg18 like Celera, >>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >>> position with the most popular or confident version of dbSNP? >>> >>> Thanks in advance >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Dear Hever, Thank you very much. Such a great news! I will try this new function rsidsToGRanges(). I believe I will use it very often in my work to extract SNP information for a set of rs ids. Thanks again, Shirley 2011/9/28 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Shirley, > > On 11-09-22 02:24 PM, Hervé Pagès wrote: >> >> Hi Shirley, >> >> On 11-09-22 11:29 AM, shirley zhang wrote: >>> >>> Dear All, >>> >>> I am planing to map the SNPids to hg18 positions (chr and position) >>> for a huge list of human snps. I've tried the package >>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >>> package: >>> >>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >>> with Group Label "reference" instead of "Celera" or "HuRef"? >> >> Yes, they are mapped to hg18. See: >> >> >> >> http://bioconductor.org/packages/release/data/annotation/html/SNPlo cs.Hsapiens.dbSNP.20090506.html >> >> >> and the man page of the package for additional details: >> >> ?> library(SNPlocs.Hsapiens.dbSNP.20090506) >> ?> ?SNPlocs.Hsapiens.dbSNP.20090506 >> >>> >>> 2. If I don't know the chr information (seqname), can I obtain the >>> position with dbSNP Id only? >> >> Unfortunately, because SNPs are stored in one data frame per >> chromosome, if you don't know the chr then you need to load and >> query each data frame individually. >> >> With more recent SNPlocs packages (e.g. >> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to >> let the user load SNPs from more than 1 chromosome in a single >> GRanges object, so you can do something like: >> >> ## Load all the SNPs in a big GRanges object (takes about >> ## 13 minutes and requires 6GB of RAM!): >> all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) >> >> ## Use the rs ids to set the names (takes about 6 minutes): >> names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id, >> sep="") >> >> ## Then extract your SNPs from the big GRanges object (again, >> ## this can take a long time, depending on how many SNPs you >> ## extract): >> my_rs_ids <- sample(names(all_snps), 1000) >> my_snps <- all_snps[my_rs_ids] >> >> However, please note that, starting with >> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), >> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. > > I've made some improvements to the SNPlocs packages. The improved > packages are in BioC 2.9 (current devel, soon to be released) and > had their versions bumped to 0.99.6. They are: > > ?SNPlocs.Hsapiens.dbSNP.20090506: dbSNP Build 130, based on hg18 > ?SNPlocs.Hsapiens.dbSNP.20100427: dbSNP Build 131, based on hg19 > ?SNPlocs.Hsapiens.dbSNP.20101109: dbSNP Build 132, based on hg19 > ?SNPlocs.Hsapiens.dbSNP.20110815: dbSNP Build 134, based on hg19 > > Only the source packages are available at the moment but they should > be installable on Windows/Mac from R-2.14 with > > ?biocLite(..., type="source"). > > The data in those packages have not changed but are organized more > efficiently. There is a new function rsidsToGRanges() to extract > SNP information for a set of rs ids. Using rsidsToGRanges() is > much more memory (1/10) and time efficient (40x) than using the > code I provided above. See ?rsidsToGRanges for the details. > > Let me know if you have questions. > > Cheers, > H. > >> >> Hope this helps, >> H. >> >>> >>> Further, I find dbSNP batch queries a little more difficult to work >>> with because they map to different versions of the hg18 like Celera, >>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >>> position with the most popular or confident version of dbSNP? >>> >>> Thanks in advance >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > -- Xiaoling (Shirley) Zhang M.D., Ph.D. (Bioinformatics) Boston University, Boston, MA Tel: (857) 233-9862 Email: zhangxl at bu.edu
ADD REPLY
0
Entering edit mode
shirley zhang ★ 1.0k
@shirley-zhang-2038
Last seen 10.2 years ago
Dear All, I am planing to map the SNPids to hg18 positions (chr and position) for a huge list of human snps. I've tried the package "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this package: 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 with Group Label "reference" instead of "Celera" or "HuRef"? 2. If I don't know the chr information (seqname), can I obtain the position with dbSNP Id only? Further, I find dbSNP batch queries a little more difficult to work with because they map to different versions of the hg18 like Celera, HumanRef, etc.Can anybody let me know a better option to get hg18 chr position with the most popular or confident version of dbSNP? Thanks in advance -- Xiaoling (Shirley) Zhang M.D., Ph.D. (Bioinformatics) Boston University, Boston, MA Tel: (857) 233-9862 Email: zhangxl at bu.edu
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
On 11-09-22 02:38 PM, Tim Triche, Jr. wrote: > Is it possible to liftOver within R? Sure. See that thread on the Bioconductor mailing list from May 2011, and in particular Michael's answer: https://stat.ethz.ch/pipermail/bioconductor/2011-May/039254.html H. PS: Don't know why the messages in this thread don't show up, only an URL to the message, so one has to open each message in a separate window or tab, which is a pain :-/ Maybe the web interface to Mailman could be configured to not do that? > > --t > > On Sep 22, 2011, at 2:24 PM, Hervé Pagès<hpages at="" fhcrc.org=""> wrote: > >> Hi Shirley, >> >> On 11-09-22 11:29 AM, shirley zhang wrote: >>> Dear All, >>> >>> I am planing to map the SNPids to hg18 positions (chr and position) >>> for a huge list of human snps. I've tried the package >>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this >>> package: >>> >>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3 >>> with Group Label "reference" instead of "Celera" or "HuRef"? >> >> Yes, they are mapped to hg18. See: >> >> http://bioconductor.org/packages/release/data/annotation/html/SNPlo cs.Hsapiens.dbSNP.20090506.html >> >> and the man page of the package for additional details: >> >> > library(SNPlocs.Hsapiens.dbSNP.20090506) >> > ?SNPlocs.Hsapiens.dbSNP.20090506 >> >>> >>> 2. If I don't know the chr information (seqname), can I obtain the >>> position with dbSNP Id only? >> >> Unfortunately, because SNPs are stored in one data frame per >> chromosome, if you don't know the chr then you need to load and >> query each data frame individually. >> >> With more recent SNPlocs packages (e.g. >> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to >> let the user load SNPs from more than 1 chromosome in a single >> GRanges object, so you can do something like: >> >> ## Load all the SNPs in a big GRanges object (takes about >> ## 13 minutes and requires 6GB of RAM!): >> all_snps<- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE) >> >> ## Use the rs ids to set the names (takes about 6 minutes): >> names(all_snps)<- paste("rs", elementMetadata(all_snps)$RefSNP_id, >> sep="") >> >> ## Then extract your SNPs from the big GRanges object (again, >> ## this can take a long time, depending on how many SNPs you >> ## extract): >> my_rs_ids<- sample(names(all_snps), 1000) >> my_snps<- all_snps[my_rs_ids] >> >> However, please note that, starting with >> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131), >> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18. >> >> Hope this helps, >> H. >> >>> >>> Further, I find dbSNP batch queries a little more difficult to work >>> with because they map to different versions of the hg18 like Celera, >>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr >>> position with the most popular or confident version of dbSNP? >>> >>> Thanks in advance >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6