getSeq human genomic sequences
0
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Tiandao, I'm glad getSeq() is throwing an error instead of silently trying to return something when the start/end are out of limits :-) We supported hg16 (NCBI34) until BioC 2.1 (R-2.6). Starting with BioC 2.2 (R-2.7, released in April 2008), we dropped it and kept only hg17 and hg18. Note that hg16 is 5.5 years old now (from Jul. 2003): http://genome.ucsc.edu/FAQ/FAQreleases#release1 I would recommend you try to get a position file with positions relative to hg17 or hg18 if possible. If this is not an option, then you can always build your own BSgenome data package for hg16. The process of building a BSgenome data package is pretty straightforward and is documented in the BSgenomeForge vignette from the BSgenome software package. If you don't need to include masks then the process is even simpler. Let me know if you need help with this. Cheers, H. Tiandao Li wrote: > Hi Herve, > > Thanks for your kind help. The position file my colleague sent was > included some position out of chromosome limits of current release of > human genome (NCBI36). And the positions on human genome were calculated > with NCBI34. If there any human genome annotation package based on NCBI34? > > Thanks, > > Tiandao > > On Fri, Jan 30, 2009 at 2:00 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">> wrote: > > Tiandao Li wrote: > > > > On Fri, Jan 30, 2009 at 1:04 PM, Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>> wrote: > > Hi Tiando, > > Have you checked the 'start' and 'end' you get when i = 5263? > They are > probably out of limits. > > > unique(chr6[,6])[5262] > [1] 122701977 > > unique(chr6[,6])[5263] > [1] 122701996 > > unique(chr6[,6])[5264] > [1] 39870984 > > > Try to isolate the problem with something like: > > i <- 5263 > start <- unique(chr6[,6])[i] > end <- start + 1 > getSeq(Hsapiens, "chr6", start, end) > > > getSeq(Hsapiens, "chr6", > unique(chr6[,6])[5262],(unique(chr6[,6])[5262]+1)) > [1] "CA" > > getSeq(Hsapiens, "chr6", > unique(chr6[,6])[5263],(unique(chr6[,6])[5263]+1)) > [1] "TG" > > getSeq(Hsapiens, "chr6", > unique(chr6[,6])[5264],(unique(chr6[,6])[5264]+1)) > [1] "TC" > > > > > So calling getSeq() outside the for() loop seems to work > for i = 5262, 5263 and 5264. Are you sure your for() loop > failed for i = 5263? > > > > > Do you have 1 <= start <= end <= length(Hsapiens$chr6) ? > > NO > > > Well, that's the problem then. getSeq() will fail if the start/end are > not within the limits of the sequence. > But what I don't understand is why you are saying NO. The start/end > you get are 122701996 and 122701997 (those are the numbers you are > showing > above) and length(Hsapiens$chr6) is 170899992. Don't you agree that > 1 <= 122701996 <= 122701997 <= 170899992 ? > > Another thing I don't understand is that in your first post the > error message > you got was showing a start value of 63219120 (but that's apparently > not the > start value corresponding to i = 5263). > > > > > I've modified the subseq() function so the error mesg you get > is a > little > bit more informative (subseq() is defined in the IRanges > package and > called by getSeq()). After you update to IRanges 1.0.11 > (available > within > 24 hours), you will get: > > > getSeq(Hsapiens, 'chr6', start=170899994, end=170899995) > > > getSeq(Hsapiens, 'chr6', start=170899994, end=170899995) > Error in solveUserSEW(length(x), start = start, end = end, width > = width) : > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied > start (170899994) is > refwidth + 1 > > > Yes this is exactly what to expect because here the start/end are out of > limits. You'll see the improved error mesg printed by getSeq() when you > update IRanges to 1.0.11 (available in about 24 hours). > > > > sessionInfo() > other attached packages: > [1] BSgenome.Hsapiens.UCSC.hg18_1.3.11 BSgenome_1.10.3 > [3] Biostrings_2.10.15 IRanges_1.0.9 > > > > Error in solveUserSEW(length(x), start = start, end = end, > width = > width) : > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied > start (170899994) is > refwidth + 1 > Error in subseq(unmasked(x), start = start, end = end, width > = width) : > Invalid sequence coordinates. > Are you sure the supplied 'start', 'end' and 'width' arguments > are defining a region that is within the limits of the > sequence? > > YES! > > > Well no, not in your call to getSeq above (start=170899994, > end=170899995), > sorry! > > All this confusion could be avoided if you sent me some stand- alone > code that I can run myself in order to reproduce the problem. > > Cheers, > > H. > > > > > However, on thing is puzzling me. The error mesg you get > shows that > the start is 63219120 and that is of course within the limits > of Human > chr6 (this chromosome is 170899992 long in hg18), so maybe > something > else > is going on... > > Cheers, > H. > > > Tiandao Li wrote: > > Hello, > > I used getSeq to retrieve human genomic sequences from the > BSgenome data > package: > > library(BSgenome.Hsapiens.UCSC.hg18) > > # cpg is GTF file > chr6 <- cpg[grep("6", cpg[[5]]),] # chrom 6 > length(unique(chr6[,6])) # 10683 genomic positions > > # loop to match "CG" and print 100bp to file > for (i in 1:10683) > { > if (getSeq(Hsapiens, "chr6", > unique(chr6[,6])[i],(unique(chr6[,6])[i]+1))=="CG") > { > write.table(paste(">", > unique(chr6[,6])[i],sep=""),file="chr6FOR",row.names=F, > col.names=F, > append=TRUE) > write.table(getSeq(Hsapiens, "chr6", > > (unique(chr6[,6])[i]-50),(unique(chr6[,6])[i]+50)),file="ch r6FOR",row.names=F, > col.names=F, append=TRUE) > } > } > > However, I always got the error meg like the following > when i = 5263 > > Error in solveUserSEW(length(x), start = start, end = > end, width > = width) : > solving row 1: 'allow.nonnarrowing' is FALSE and the > supplied start > (63219120) is > refwidth + 1 > > Please let me know if there's something I should be paying > attention to. > > Thanks! > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > <mailto:bioconductor at="" stat.math.ethz.ch=""> > <mailto:bioconductor at="" stat.math.ethz.ch=""> <mailto:bioconductor at="" stat.math.ethz.ch="">> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Annotation Cancer BSgenome PROcess GLAD BSgenome IRanges Annotation Cancer BSgenome GLAD • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6