Renaming the seqlevels in a transcript database from GenomicFeatures
1
4
Entering edit mode
@fong-chun-chan-5706
Last seen 10.4 years ago
Hi, I am using the GenomicFeatures package to extract exons from a transcript database file. I am using the ensembl transcript database which has no chr, yet my bam files that I am working have the chr prefix. I was thinking one could append chr to the seqlevels in the transcript database like this: library('GenomicFeatures') txdb <- loadDb(txdbFile) newSeqNames <- paste('chr', seqlevels(txdb), sep = '') names(newSeqNames) <- seqlevels(txdb) txdb <- renameSeqlevels( txdb, newSeqNames ) seqlevels(txdb) [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11" "chr12" [13] "chr13" "chr14" This appears to work. But when I run: transcripts(txdb, vals = list( tx_chrom = "chr1") ) GRanges with 0 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> --- seqlengths: It suggests that the chromosome renaming is not working...because when I run: transcripts(txdb, vals = list( tx_chrom = '1') ) GRanges with 17239 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr1 [11869, 14409] + | 126213 ENST00000456328 Has anyone had any luck in getting this to work? Thanks, [[alternative HTML version deleted]]
GenomicFeatures GenomicFeatures • 6.6k views
ADD COMMENT
4
Entering edit mode
Sonali Arora ▴ 390
@sonali-arora-6563
Last seen 8.8 years ago
United States
Hi Fong, A TranscriptDb cannot be directly subset with 'keepSeqlevels' and 'dropSeqlevels'. But the GRanges or GRangesLists extracted from the TranscriptDb can be subset. Below I show an example for how to extract GRanges, subset and change the chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") ## load packages library(GenomicFeatures) library(TxDb.Hsapiens.UCSC.hg18.knownGene) ## extract GRanges txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene seqlevels(txdb) txbygene <- transcriptsBy(txdb, "gene") seqlevels(txbygene) ## subset to get only ranges from chr 1 to 22 auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", group="auto") txbygene <- keepSeqlevels(txbygene,auto) seqlevels(txbygene) ##convert GRangesList to GRanges ucsc_gr <- unlist(txbygene) ## renaming styles. newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) ##check that naming was successful ncbi_gr[seqnames(ncbi_gr)=="1"] ucsc_gr[seqnames(ucsc_gr)=="chr1"] You can easily adapt this example for your scenario - to convert from "1" to "chr1". For more details : Please look at examples section of: http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomeInfoDb /inst/doc/GenomeInfoDb.pdf and the man page for ?keepSeqlevels. Thanks and Regards, Sonali. On 6/26/2014 2:02 PM, Fong Chun Chan wrote: > Hi, > > I am using the GenomicFeatures package to extract exons from a transcript > database file. I am using the ensembl transcript database which has no chr, > yet my bam files that I am working have the chr prefix. > > I was thinking one could append chr to the seqlevels in the transcript > database like this: > > library('GenomicFeatures') > txdb <- loadDb(txdbFile) > newSeqNames <- paste('chr', seqlevels(txdb), sep = '') > names(newSeqNames) <- seqlevels(txdb) > txdb <- renameSeqlevels( txdb, newSeqNames ) > seqlevels(txdb) > > [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" > "chr11" "chr12" [13] "chr13" "chr14" > > This appears to work. > > But when I run: > > transcripts(txdb, vals = list( tx_chrom = "chr1") ) > > GRanges with 0 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > --- > seqlengths: > > It suggests that the chromosome renaming is not working...because when I > run: > > transcripts(txdb, vals = list( tx_chrom = '1') ) > > GRanges with 17239 ranges and 2 metadata columns: > seqnames ranges strand | tx_id > tx_name > <rle> <iranges> <rle> | <integer> > <character> > [1] chr1 [11869, 14409] + | 126213 > ENST00000456328 > > Has anyone had any luck in getting this to work? > > Thanks, > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Thanks and Regards, Sonali
ADD COMMENT
0
Entering edit mode
Hi Sonali, I'm pretty sure what Fong has reported is a legitimate bug. The seqlevels are mapped at some level, but the values for tx_chrom have not been updated. Also, what you say about keepSeqlevels and TranscriptDb is not true, at least for devel (and this stuff was written a year ago): > seqlevels(keepSeqlevels(txdb, "1")) [1] "1" Michael On Fri, Jun 27, 2014 at 12:18 PM, Sonali Arora <sarora@fhcrc.org> wrote: > Hi Fong, > > A TranscriptDb cannot be directly subset with 'keepSeqlevels' and > 'dropSeqlevels'. > But the GRanges or GRangesLists extracted from the TranscriptDb can be > subset. > Below I show an example for how to extract GRanges, subset and change the > chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") > > ## load packages > library(GenomicFeatures) > library(TxDb.Hsapiens.UCSC.hg18.knownGene) > > ## extract GRanges > txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene > seqlevels(txdb) > txbygene <- transcriptsBy(txdb, "gene") > seqlevels(txbygene) > > ## subset to get only ranges from chr 1 to 22 > auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", > group="auto") > txbygene <- keepSeqlevels(txbygene,auto) > seqlevels(txbygene) > > ##convert GRangesList to GRanges > ucsc_gr <- unlist(txbygene) > > ## renaming styles. > newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") > ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) > > ##check that naming was successful > ncbi_gr[seqnames(ncbi_gr)=="1"] > ucsc_gr[seqnames(ucsc_gr)=="chr1"] > > You can easily adapt this example for your scenario - to convert from > "1" to "chr1". For more details : Please look at examples section of: > http://www.bioconductor.org/packages/devel/bioc/vignettes/ > GenomeInfoDb/inst/doc/GenomeInfoDb.pdf > and the man page for ?keepSeqlevels. > > Thanks and Regards, > Sonali. > > > > > On 6/26/2014 2:02 PM, Fong Chun Chan wrote: > >> Hi, >> >> I am using the GenomicFeatures package to extract exons from a transcript >> database file. I am using the ensembl transcript database which has no >> chr, >> yet my bam files that I am working have the chr prefix. >> >> I was thinking one could append chr to the seqlevels in the transcript >> database like this: >> >> library('GenomicFeatures') >> txdb <- loadDb(txdbFile) >> newSeqNames <- paste('chr', seqlevels(txdb), sep = '') >> names(newSeqNames) <- seqlevels(txdb) >> txdb <- renameSeqlevels( txdb, newSeqNames ) >> seqlevels(txdb) >> >> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" >> "chr11" "chr12" [13] "chr13" "chr14" >> >> This appears to work. >> >> But when I run: >> >> transcripts(txdb, vals = list( tx_chrom = "chr1") ) >> >> GRanges with 0 ranges and 2 metadata columns: >> seqnames ranges strand | tx_id tx_name >> <rle> <iranges> <rle> | <integer> <character> >> --- >> seqlengths: >> >> It suggests that the chromosome renaming is not working...because when I >> run: >> >> transcripts(txdb, vals = list( tx_chrom = '1') ) >> >> GRanges with 17239 ranges and 2 metadata columns: >> seqnames ranges strand | tx_id >> tx_name >> <rle> <iranges> <rle> | <integer> >> <character> >> [1] chr1 [11869, 14409] + | 126213 >> ENST00000456328 >> >> Has anyone had any luck in getting this to work? >> >> Thanks, >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > -- > Thanks and Regards, > Sonali > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane. > science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi guys, Indeed, 'transcripts(txdb, vals=list(tx_chrom="some_chrom"))' is broken when the "effective" seqlevels are different from the "native" ones. More precisely, the transcripts(), exons(), and cds() extractors are not translating the user-supplied seqlevels to native seqlevels before injecting them in the SQL query they generate. We need to fix that. We will post again here when it's done. Thanks for the report. H. On 06/27/2014 07:53 PM, Michael Lawrence wrote: > Hi Sonali, > > I'm pretty sure what Fong has reported is a legitimate bug. The seqlevels > are mapped at some level, but the values for tx_chrom have not been > updated. > > Also, what you say about keepSeqlevels and TranscriptDb is not true, at > least for devel (and this stuff was written a year ago): > >> seqlevels(keepSeqlevels(txdb, "1")) > [1] "1" > > Michael > > > On Fri, Jun 27, 2014 at 12:18 PM, Sonali Arora <sarora at="" fhcrc.org=""> wrote: > >> Hi Fong, >> >> A TranscriptDb cannot be directly subset with 'keepSeqlevels' and >> 'dropSeqlevels'. >> But the GRanges or GRangesLists extracted from the TranscriptDb can be >> subset. >> Below I show an example for how to extract GRanges, subset and change the >> chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1") >> >> ## load packages >> library(GenomicFeatures) >> library(TxDb.Hsapiens.UCSC.hg18.knownGene) >> >> ## extract GRanges >> txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene >> seqlevels(txdb) >> txbygene <- transcriptsBy(txdb, "gene") >> seqlevels(txbygene) >> >> ## subset to get only ranges from chr 1 to 22 >> auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC", >> group="auto") >> txbygene <- keepSeqlevels(txbygene,auto) >> seqlevels(txbygene) >> >> ##convert GRangesList to GRanges >> ucsc_gr <- unlist(txbygene) >> >> ## renaming styles. >> newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI") >> ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle) >> >> ##check that naming was successful >> ncbi_gr[seqnames(ncbi_gr)=="1"] >> ucsc_gr[seqnames(ucsc_gr)=="chr1"] >> >> You can easily adapt this example for your scenario - to convert from >> "1" to "chr1". For more details : Please look at examples section of: >> http://www.bioconductor.org/packages/devel/bioc/vignettes/ >> GenomeInfoDb/inst/doc/GenomeInfoDb.pdf >> and the man page for ?keepSeqlevels. >> >> Thanks and Regards, >> Sonali. >> >> >> >> >> On 6/26/2014 2:02 PM, Fong Chun Chan wrote: >> >>> Hi, >>> >>> I am using the GenomicFeatures package to extract exons from a transcript >>> database file. I am using the ensembl transcript database which has no >>> chr, >>> yet my bam files that I am working have the chr prefix. >>> >>> I was thinking one could append chr to the seqlevels in the transcript >>> database like this: >>> >>> library('GenomicFeatures') >>> txdb <- loadDb(txdbFile) >>> newSeqNames <- paste('chr', seqlevels(txdb), sep = '') >>> names(newSeqNames) <- seqlevels(txdb) >>> txdb <- renameSeqlevels( txdb, newSeqNames ) >>> seqlevels(txdb) >>> >>> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" >>> "chr11" "chr12" [13] "chr13" "chr14" >>> >>> This appears to work. >>> >>> But when I run: >>> >>> transcripts(txdb, vals = list( tx_chrom = "chr1") ) >>> >>> GRanges with 0 ranges and 2 metadata columns: >>> seqnames ranges strand | tx_id tx_name >>> <rle> <iranges> <rle> | <integer> <character> >>> --- >>> seqlengths: >>> >>> It suggests that the chromosome renaming is not working...because when I >>> run: >>> >>> transcripts(txdb, vals = list( tx_chrom = '1') ) >>> >>> GRanges with 17239 ranges and 2 metadata columns: >>> seqnames ranges strand | tx_id >>> tx_name >>> <rle> <iranges> <rle> | <integer> >>> <character> >>> [1] chr1 [11869, 14409] + | 126213 >>> ENST00000456328 >>> >>> Has anyone had any luck in getting this to work? >>> >>> Thanks, >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane. >>> science.biology.informatics.conductor >>> >> >> -- >> Thanks and Regards, >> Sonali >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY

Login before adding your answer.

Traffic: 1001 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6