DESeq2 : NA p values reported

0

Entering edit mode

Abhishek Pratap ▴ 160

@abhishek-pratap-6167

Last seen 10.7 years ago

Hi Michael and Simon I am starting to use DESeq2 on a 60 sample count data and seeing it report NA p-values for multiple genes. Not sure if this is due to cooks distance or some other thing. I did see the seqanswer thread to turn off cooks distance but dint work for me in DESeq2 Copying just one example.. unfortunately the formatting will be messed up. Let me know if there is a good place to copy it for easier reading. Thanks! -Abhi baseMean log2FoldChange lfcSE stat pvalue padj SYMBOL 16012 58 5.556873e+01 1.4310814 0.2586475 5.532941 NA NA mcols(diff_exp_result)[16012,] DataFrame with 1 row and 23 columns baseMean baseVar allZero dispGeneEst dispGeneEstConv dispFit dispersion dispIter dispConv dispOutlier dispMAP Intercept <numeric> <numeric> <logical> <numeric> <logical> <numeric> <numeric> <numeric> <logical> <logical> <numeric> <numeric> 1 1.437498 3.786966 FALSE 1.302189 TRUE 1.987536 1.404384 8 TRUE FALSE 1.404384 0.2919642 condition_high_vs_low SE_Intercept SE_condition_high_vs_low WaldStatistic_Intercept WaldStatistic_condition_high_vs_low <numeric> <numeric> <numeric> <numeric> <numeric> 1 0.3913094 0.3133799 0.2582543 0.9316622 1.51521 WaldPvalue_Intercept WaldPvalue_condition_high_vs_low betaConv betaIter deviance maxCooks <numeric> <numeric> <logical> <numeric> <numeric> <numeric> 1 0.3515111 0.1297193 TRUE 6 184.0293 0.5676499 > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] synapseClient_0.32-1 getopt_1.20.0 TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 [4] GenomicFeatures_1.14.0 GGally_0.4.4 pheatmap_0.7.7 [7] gplots_2.12.1 RColorBrewer_1.0-5 Rsamtools_1.14.1 [10] Biostrings_2.30.1 gdata_2.13.2 reshape_0.8.4 [13] plyr_1.8 ggplot2_0.9.3.1 DESeq2_1.2.5 [16] RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3 [19] XVector_0.2.0 IRanges_1.20.5 org.Hs.eg.db_2.10.1 [22] RSQLite_0.11.4 DBI_0.2-7 AnnotationDbi_1.24.0 [25] Biobase_2.22.0 BiocGenerics_0.8.0 BiocInstaller_1.12.0 [[alternative HTML version deleted]]

DESeq2 DESeq2 • 2.9k views

ADD COMMENT • link updated 11.4 years ago by array chip ▴ 420 • written 11.4 years ago by Abhishek Pratap ▴ 160

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 6 days ago

United States

hi Abhishek, check out the ?results man page: "By default, independent filtering is performed to select a set of genes which will result in the most genes with adjusted p-values less than a threshold, alpha. The adjusted p-values for the genes which do not pass the filter threshold are set to NA." also section 1.4.2 in the vignette: "The results for particular genes can be set to NA, for either one of the following reasons: ..." hope this helps, Mike On Wed, Nov 20, 2013 at 9:16 PM, Abhishek Pratap <apratap@sagebase.org>wrote: > Hi Michael and Simon > > I am starting to use DESeq2 on a 60 sample count data and seeing it report > NA p-values for multiple genes. Not sure if this is due to cooks distance > or some other thing. I did see the seqanswer thread to turn off cooks > distance but dint work for me in DESeq2 > > > Copying just one example.. unfortunately the formatting will be messed up. > Let me know if there is a good place to copy it for easier reading. > > Thanks! > -Abhi > > baseMean log2FoldChange lfcSE stat pvalue padj > SYMBOL > 16012 58 5.556873e+01 1.4310814 0.2586475 5.532941 NA > NA > > > mcols(diff_exp_result)[16012,] > > DataFrame with 1 row and 23 columns > baseMean baseVar allZero dispGeneEst dispGeneEstConv dispFit > dispersion dispIter dispConv dispOutlier dispMAP Intercept > <numeric> <numeric> <logical> <numeric> <logical> <numeric> > <numeric> <numeric> <logical> <logical> <numeric> <numeric> > 1 1.437498 3.786966 FALSE 1.302189 TRUE 1.987536 > 1.404384 8 TRUE FALSE 1.404384 0.2919642 > condition_high_vs_low SE_Intercept SE_condition_high_vs_low > WaldStatistic_Intercept WaldStatistic_condition_high_vs_low > <numeric> <numeric> <numeric> > <numeric> <numeric> > 1 0.3913094 0.3133799 0.2582543 > 0.9316622 1.51521 > WaldPvalue_Intercept WaldPvalue_condition_high_vs_low betaConv betaIter > deviance maxCooks > <numeric> <numeric> <logical> <numeric> > <numeric> <numeric> > 1 0.3515111 0.1297193 TRUE 6 > 184.0293 0.5676499 > > > > > > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] synapseClient_0.32-1 getopt_1.20.0 > TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 > [4] GenomicFeatures_1.14.0 GGally_0.4.4 > pheatmap_0.7.7 > [7] gplots_2.12.1 RColorBrewer_1.0-5 > Rsamtools_1.14.1 > [10] Biostrings_2.30.1 gdata_2.13.2 > reshape_0.8.4 > [13] plyr_1.8 ggplot2_0.9.3.1 > DESeq2_1.2.5 > [16] RcppArmadillo_0.3.920.1 Rcpp_0.10.6 > GenomicRanges_1.14.3 > [19] XVector_0.2.0 IRanges_1.20.5 > org.Hs.eg.db_2.10.1 > [22] RSQLite_0.11.4 DBI_0.2-7 > AnnotationDbi_1.24.0 > [25] Biobase_2.22.0 BiocGenerics_0.8.0 > BiocInstaller_1.12.0 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.4 years ago Michael Love 43k

0

Entering edit mode

array chip ▴ 420

@array-chip-4136

Last seen 15 months ago

United States

Hi, Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? Thanks John [[alternative HTML version deleted]]

ADD COMMENT • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

Assuming human, use the Homo.sapiens package: library(Homo.sapiens) id <- select(Homo.sapiens, "KIT", "ENTREZID", "SYMBOL")$ENTREZID exons(Homo.sapiens, list(gene_id = id)) It would be a lot nicer if exons() could take the gene symbol directly, but I've long given up trying to request that. The nice thing about this approach is that you have a GRanges, and you're not off in data.frame land. On Thu, Nov 21, 2013 at 12:14 AM, array chip <arrayprofile@yahoo.com> wrote: > Hi, > > > Can anyone suggest how to retrieve the genomic coordinates for all exons > for a given gene by say gene symbol? For example, how to retrieve the > coordinates for all 21 exons for gene KIT? > > Thanks > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thank you Michael. I got an error message when running: > library(Homo.sapiens) Loading required package: GO.db Error : .onLoad failed in loadNamespace() for 'GO.db', details: Â call: get(name, envir = asNamespace(pkg), inherits = FALSE) Â error: object '.setDummyField' not found Error: package âGO.dbâ could not be loaded I already re-installing GO.db package, but still got the same error: > biocLite("GO.db") BioC_mirror: http://bioconductor.org Using Bioconductor version 2.12 (BiocInstaller 1.10.4), R version 3.0.0. Installing package(s) 'GO.db' trying URL 'http://bioconductor.org/packages/2.12/data/annotation/bin/ windows/contrib/3.0/GO.db_2.9.0.zip' Content type 'application/zip' length 25091062 bytes (23.9 Mb) opened URL downloaded 23.9 Mb John ________________________________ From: Michael Lawrence <lawrence.michael@gene.com> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, November 21, 2013 7:43 AM Subject: Re: [BioC] exon genomic coordinates Assuming human, use the Homo.sapiens package: library(Homo.sapiens) id <- select(Homo.sapiens, "KIT", "ENTREZID", "SYMBOL")$ENTREZID exons(Homo.sapiens, list(gene_id = id)) It would be a lot nicer if exons() could take the gene symbol directly, but I've long given up trying to request that. The nice thing about this approach is that you have a GRanges, and you're not off in data.frame land. Hi, > > >Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? > >Thanks > >John >Â Â Â Â [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

On 11/21/2013 02:21 PM, array chip wrote: > Thank you Michael. I got an error message when running: > >> library(Homo.sapiens) > Loading required package: GO.db > Error : .onLoad failed in loadNamespace() for 'GO.db', details: > ? call: get(name, envir = asNamespace(pkg), inherits = FALSE) > ? error: object '.setDummyField' not found > Error: package ???GO.db??? could not be loaded I'm guess that you're using a mac and R-3.0.0 You'll need to either update your R to the current release R-3.0.2 or install the software needed to build R packages from source and update all of your packages using type="source". It's easier to update your R, so I won't provide instructions for the latter. The problem is that a backward incompatibility was introduced in the methods package going from R-3.0.0 to R-3.0.1. The incompatibility means that the Macintosh binaries we (Bioconductor) build with the current R only work with versions of R after the incompatiblity was introduced. So you either need to use a current R, or build the packages yourself. Martin > > > I already re-installing GO.db package, but still got the same error: > >> biocLite("GO.db") > BioC_mirror: http://bioconductor.org > Using Bioconductor version 2.12 (BiocInstaller 1.10.4), R version 3.0.0. > Installing package(s) 'GO.db' > trying URL 'http://bioconductor.org/packages/2.12/data/annotation/bi n/windows/contrib/3.0/GO.db_2.9.0.zip' > Content type 'application/zip' length 25091062 bytes (23.9 Mb) > opened URL > downloaded 23.9 Mb > > > John > > > ________________________________ > From: Michael Lawrence <lawrence.michael at="" gene.com=""> > > Cc: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Sent: Thursday, November 21, 2013 7:43 AM > Subject: Re: [BioC] exon genomic coordinates > > > > Assuming human, use the Homo.sapiens package: > > > library(Homo.sapiens) > id <- select(Homo.sapiens, "KIT", "ENTREZID", "SYMBOL")$ENTREZID > exons(Homo.sapiens, list(gene_id = id)) > > It would be a lot nicer if exons() could take the gene symbol directly, but I've long given up trying to request that. > > The nice thing about this approach is that you have a GRanges, and you're not off in data.frame land. > > > > > > > > > Hi, >> >> >> Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? >> >> Thanks >> >> John >> ? ? ? ? [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 11.4 years ago Martin Morgan 25k

0

Entering edit mode

Thanks Martin. I am using R3.0.0 on a PC. John ________________________________ From: Martin Morgan <mtmorgan@fhcrc.org> @gene.com> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Thursday, November 21, 2013 2:40 PM Subject: Re: [BioC] exon genomic coordinates On 11/21/2013 02:21 PM, array chip wrote: > Thank you Michael. I got an error message when running: > >> library(Homo.sapiens) > Loading required package: GO.db > Error : .onLoad failed in loadNamespace() for 'GO.db', details: > Â call: get(name, envir = asNamespace(pkg), inherits = FALSE) > Â error: object '.setDummyField' not found > Error: package â€˜GO.dbâ€™ could not be loaded I'm guess that you're using a mac and R-3.0.0 You'll need to either update your R to the current release R-3.0.2 or install the software needed to build R packages from source and update all of your packages using type="source". It's easier to update your R, so I won't provide instructions for the latter. The problem is that a backward incompatibility was introduced in the methods package going from R-3.0.0 to R-3.0.1. The incompatibility means that the Macintosh binaries we (Bioconductor) build with the current R only work with versions of R after the incompatiblity was introduced. So you either need to use a current R, or build the packages yourself. Martin > > > I already re-installing GO.db package, but still got the same error: > >> biocLite("GO.db") > BioC_mirror: http://bioconductor.org > Using Bioconductor version 2.12 (BiocInstaller 1.10.4), R version 3.0.0. > Installing package(s) 'GO.db' > trying URL 'http://bioconductor.org/packages/2.12/data/annotation/bi n/windows/contrib/3.0/GO.db_2.9.0.zip' > Content type 'application/zip' length 25091062 bytes (23.9 Mb) > opened URL > downloaded 23.9 Mb > > > John > > > ________________________________ > From: Michael Lawrence <lawrence.michael@gene.com> > > Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> > Sent: Thursday, November 21, 2013 7:43 AM > Subject: Re: [BioC] exon genomic coordinates > > > > Assuming human, use the Homo.sapiens package: > > > library(Homo.sapiens) > id <- select(Homo.sapiens, "KIT", "ENTREZID", "SYMBOL")$ENTREZID > exons(Homo.sapiens, list(gene_id = id)) > > It would be a lot nicer if exons() could take the gene symbol directly, but I've long given up trying to request that. > > The nice thing about this approach is that you have a GRanges, and you're not off in data.frame land. > > > > > > > > > Hi, >> >> >> Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? >> >> Thanks >> >> John >> Â Â Â Â [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

On 11/21/2013 02:53 PM, array chip wrote: > Thanks Martin. I am using R3.0.0 on a PC. yes, sorry, same story -- update R (recommended) or install tools to build packages from source. Martin > > John > > -------------------------------------------------------------------- ------------ > *From:* Martin Morgan <mtmorgan at="" fhcrc.org=""> > *To:* array chip <arrayprofile at="" yahoo.com="">; Michael Lawrence > <lawrence.michael at="" gene.com=""> > *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > *Sent:* Thursday, November 21, 2013 2:40 PM > *Subject:* Re: [BioC] exon genomic coordinates > > On 11/21/2013 02:21 PM, array chip wrote: > > Thank you Michael. I got an error message when running: > > > >> library(Homo.sapiens) > > Loading required package: GO.db > > Error : .onLoad failed in loadNamespace() for 'GO.db', details: > > ? call: get(name, envir = asNamespace(pkg), inherits = FALSE) > > ? error: object '.setDummyField' not found > > Error: package ???GO.db??? could not be loaded > > I'm guess that you're using a mac and R-3.0.0 > > You'll need to either update your R to the current release R-3.0.2 or install > the software needed to build R packages from source and update all of your > packages using type="source". It's easier to update your R, so I won't provide > instructions for the latter. > > The problem is that a backward incompatibility was introduced in the methods > package going from R-3.0.0 to R-3.0.1. The incompatibility means that the > Macintosh binaries we (Bioconductor) build with the current R only work with > versions of R after the incompatiblity was introduced. So you either need to use > a current R, or build the packages yourself. > > Martin > > > > > > > I already re-installing GO.db package, but still got the same error: > > > >> biocLite("GO.db") > > BioC_mirror: http://bioconductor.org <http: bioconductor.org=""/> > > Using Bioconductor version 2.12 (BiocInstaller 1.10.4), R version 3.0.0. > > Installing package(s) 'GO.db' > > trying URL > 'http://bioconductor.org/packages/2.12/data/annotation/bin/windows/c ontrib/3.0/GO.db_2.9.0.zip' > <http: bioconductor.org="" packages="" 2.12="" data="" annotation="" bin="" windows="" c="" ontrib="" 3.0="" go.db_2.9.0.zip%27=""> > > Content type 'application/zip' length 25091062 bytes (23.9 Mb) > > opened URL > > downloaded 23.9 Mb > > > > > > John > > > > > > ________________________________ > > From: Michael Lawrence <lawrence.michael at="" gene.com=""> <mailto:lawrence.michael at="" gene.com="">> > > > > Cc: "bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">" > <bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > Sent: Thursday, November 21, 2013 7:43 AM > > Subject: Re: [BioC] exon genomic coordinates > > > > > > > > Assuming human, use the Homo.sapiens package: > > > > > > library(Homo.sapiens) > > id <- select(Homo.sapiens, "KIT", "ENTREZID", "SYMBOL")$ENTREZID > > exons(Homo.sapiens, list(gene_id = id)) > > > > It would be a lot nicer if exons() could take the gene symbol directly, but > I've long given up trying to request that. > > > > The nice thing about this approach is that you have a GRanges, and you're not > off in data.frame land. > > > > > > > > > > > > > > > > > > Hi, > >> > >> > >> Can anyone suggest how to retrieve the genomic coordinates for all exons for > a given gene by say gene symbol? For example, how to retrieve the coordinates > for all 21 exons for gene KIT? > >> > >> Thanks > >> > >> John > >> ? ? ? ? [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > [[alternative HTML version deleted]] > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD REPLY • link 11.4 years ago Martin Morgan 25k

0

Entering edit mode

Hi John You can use the BioMart database, which you can access with the biomaRt package to get all exons for all transcripts for a given giene, eg: library(biomaRt) ensembl = useMart("ensembl") #assuming you are interested in mouse mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exon_ id","ensembl_transcript_id","ensembl_gene_id"), filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) Hope this helps Hans-Rudolf On 11/21/2013 09:14 AM, array chip wrote: > Hi, > > > Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? > > Thanks > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 11.4 years ago Hotz, Hans-Rudolf ▴ 400

0

Entering edit mode

Thanks Hans-Rudolf John ________________________________ From: Hans-Rudolf Hotz <hrh@fmi.ch> onductor@r-project.org> Sent: Thursday, November 21, 2013 5:38 AM Subject: Re: [BioC] exon genomic coordinates Hi John You can use the BioMart database, which you can access with the biomaRt package to get all exons for all transcripts for a given giene, eg: library(biomaRt) ensembl = useMart("ensembl") #assuming you are interested in mouse mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exon_ id","ensembl_transcript_id","ensembl_gene_id"), filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) Hope this helps Hans-Rudolf On 11/21/2013 09:14 AM, array chip wrote: > Hi, > > > Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? > > Thanks > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

Hi, I am trying to use4 bioMart to retrieve the exon coordinates using the example provided below: library(biomaRt) ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) The above works fine. However, when I tried to add "hgnc_symbol" to the attributes list, it gave me error: getBM(attributes = c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end", "rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", "exon_chrom_start", : Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed But if I keep "hgnc_symbol" in the atributes list and remove "exon_chrom_start" and "exon_chrom_end", then it worked again: getBM(attributes = c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) Can anyone tell me why is that? Thanks John ________________________________ From: Hans-Rudolf Hotz <hrh@fmi.ch> onductor@r-project.org> Sent: Thursday, November 21, 2013 5:38 AM Subject: Re: [BioC] exon genomic coordinates Hi John You can use the BioMart database, which you can access with the biomaRt package to get all exons for all transcripts for a given giene, eg: library(biomaRt) ensembl = useMart("ensembl") #assuming you are interested in mouse mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exon_ id","ensembl_transcript_id","ensembl_gene_id"), filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) Hope this helps Hans-Rudolf On 11/21/2013 09:14 AM, array chip wrote: > Hi, > > > Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? > > Thanks > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

Hi all, have another questions about exon genomic coordinates: library(biomaRt) ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > getBM(attributes = c("external_gene_id","chromosome_name","exon_chrom_start","exon_chrom_ end","ensembl_transcript_id","rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id rank 1 4 55524085 55524248 ENST00000412167 1 2 4 55561678 55561947 ENST00000412167 2 3 4 55564450 55564731 ENST00000412167 3 4 4 55565796 55565932 ENST00000412167 4 5 4 55569890 55570058 ENST00000412167 5 6 4 55573264 55573453 ENST00000412167 6 7 4 55575590 55575705 ENST00000412167 7 8 4 55589750 55589864 ENST00000412167 8 9 4 55592023 55592204 ENST00000412167 9 10 4 55593384 55593490 ENST00000412167 10 11 4 55593582 55593708 ENST00000412167 11 12 4 55593989 55594093 ENST00000412167 12 13 4 55594177 55594287 ENST00000412167 13 14 4 55595501 55595651 ENST00000412167 14 15 4 55597494 55597585 ENST00000412167 15 16 4 55598037 55598164 ENST00000412167 16 17 4 55599236 55599358 ENST00000412167 17 18 4 55602664 55602775 ENST00000412167 18 19 4 55602887 55602986 ENST00000412167 19 20 4 55603341 55603446 ENST00000412167 20 21 4 55604595 55605177 ENST00000412167 21 22 4 55524085 55524248 ENST00000288135 1 23 4 55561678 55561947 ENST00000288135 2 24 4 55564450 55564731 ENST00000288135 3 25 4 55565796 55565932 ENST00000288135 4 26 4 55569890 55570058 ENST00000288135 5 27 4 55573264 55573453 ENST00000288135 6 28 4 55575590 55575705 ENST00000288135 7 29 4 55589750 55589864 ENST00000288135 8 30 4 55592023 55592216 ENST00000288135 9 31 4 55593384 55593490 ENST00000288135 10 32 4 55593582 55593708 ENST00000288135 11 33 4 55593989 55594093 ENST00000288135 12 34 4 55594177 55594287 ENST00000288135 13 35 4 55595501 55595651 ENST00000288135 14 36 4 55597494 55597585 ENST00000288135 15 37 4 55598037 55598164 ENST00000288135 16 38 4 55599236 55599358 ENST00000288135 17 39 4 55602664 55602775 ENST00000288135 18 40 4 55602887 55602986 ENST00000288135 19 41 4 55603341 55603446 ENST00000288135 20 42 4 55604595 55606881 ENST00000288135 21 43 4 55524106 55524248 ENST00000514582 1 44 4 55561678 55562072 ENST00000514582 2 45 4 55595458 55595651 ENST00000512959 1 46 4 55597494 55597585 ENST00000512959 2 47 4 55598037 55598164 ENST00000512959 3 48 4 55599236 55599567 ENST00000512959 4 This will give many versions of genomic coordinates. For example, KIT has 3 sets of exons. I think these different versions may refer to different splicing variants/isoforms. Is there a "default"/"standard" set of exons for each gene? and how do I know which one is such one? Thanks John ________________________________ To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, November 25, 2013 12:41 PM Subject: Re: [BioC] exon genomic coordinates Hi, I am trying to use4 bioMart to retrieve the exon coordinates using the example provided below: library(biomaRt) ensembl = useMart("ensembl") ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) The above works fine. However, when I tried to add "hgnc_symbol" to the attributes list, it gave me error: getBM(attributes = c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end", "rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", "exon_chrom_start", : Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed But if I keep "hgnc_symbol" in the atributes list and remove "exon_chrom_start" and "exon_chrom_end", then it worked again: getBM(attributes = c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) Can anyone tell me why is that? Thanks John ________________________________ From: Hans-Rudolf Hotz <hrh@fmi.ch> onductor@r-project.org> Sent: Thursday, November 21, 2013 5:38 AM Subject: Re: [BioC] exon genomic coordinates Hi John You can use the BioMart database, which you can access with the biomaRt package to get all exons for all transcripts for a given giene, eg: library(biomaRt) ensembl = useMart("ensembl") #assuming you are interested in mouse mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) getBM(attributes = c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exon_ id","ensembl_transcript_id","ensembl_gene_id"), filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) Hope this helps Hans-Rudolf On 11/21/2013 09:14 AM, array chip wrote: > Hi, > > > Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? > > Thanks > > John > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

Well, one approach is to take the longest one. That's what UCSC uses to call its "canonical transcripts". And restrict to the consensus CDS (CCDS). On Mon, Nov 25, 2013 at 1:12 PM, array chip <arrayprofile@yahoo.com> wrote: > Hi all, have another questions about exon genomic coordinates: > > library(biomaRt) > ensembl = useMart("ensembl") > > ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > > > getBM(attributes = > c("external_gene_id","chromosome_name","exon_chrom_start","exon_chro m_end","ensembl_transcript_id","rank"), filters > = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id > rank > 1 4 55524085 55524248 ENST00000412167 > 1 > 2 4 55561678 55561947 ENST00000412167 > 2 > 3 4 55564450 55564731 ENST00000412167 > 3 > 4 4 55565796 55565932 ENST00000412167 > 4 > 5 4 55569890 55570058 ENST00000412167 > 5 > 6 4 55573264 55573453 ENST00000412167 > 6 > 7 4 55575590 55575705 ENST00000412167 > 7 > 8 4 55589750 55589864 ENST00000412167 > 8 > 9 4 55592023 55592204 ENST00000412167 > 9 > 10 4 55593384 55593490 ENST00000412167 > 10 > 11 4 55593582 55593708 ENST00000412167 > 11 > 12 4 55593989 55594093 ENST00000412167 > 12 > 13 4 55594177 55594287 ENST00000412167 > 13 > 14 4 55595501 55595651 ENST00000412167 > 14 > 15 4 55597494 55597585 ENST00000412167 > 15 > 16 4 55598037 55598164 ENST00000412167 > 16 > 17 4 55599236 55599358 ENST00000412167 > 17 > 18 4 55602664 55602775 ENST00000412167 > 18 > 19 4 55602887 55602986 ENST00000412167 > 19 > 20 4 55603341 55603446 ENST00000412167 > 20 > 21 4 55604595 55605177 ENST00000412167 > 21 > 22 4 55524085 55524248 ENST00000288135 > 1 > 23 4 55561678 55561947 ENST00000288135 > 2 > 24 4 55564450 55564731 ENST00000288135 > 3 > 25 4 55565796 55565932 ENST00000288135 > 4 > 26 4 55569890 55570058 ENST00000288135 > 5 > 27 4 55573264 55573453 ENST00000288135 > 6 > 28 4 55575590 55575705 ENST00000288135 > 7 > 29 4 55589750 55589864 ENST00000288135 > 8 > 30 4 55592023 55592216 ENST00000288135 > 9 > 31 4 55593384 55593490 ENST00000288135 > 10 > 32 4 55593582 55593708 ENST00000288135 > 11 > 33 4 55593989 55594093 ENST00000288135 > 12 > 34 4 55594177 55594287 ENST00000288135 > 13 > 35 4 55595501 55595651 ENST00000288135 > 14 > 36 4 55597494 55597585 ENST00000288135 > 15 > 37 4 55598037 55598164 ENST00000288135 > 16 > 38 4 55599236 55599358 ENST00000288135 > 17 > 39 4 55602664 55602775 ENST00000288135 > 18 > 40 4 55602887 55602986 ENST00000288135 > 19 > 41 4 55603341 55603446 ENST00000288135 > 20 > 42 4 55604595 55606881 ENST00000288135 > 21 > 43 4 55524106 55524248 ENST00000514582 > 1 > 44 4 55561678 55562072 ENST00000514582 > 2 > 45 4 55595458 55595651 ENST00000512959 > 1 > 46 4 55597494 55597585 ENST00000512959 > 2 > 47 4 55598037 55598164 ENST00000512959 > 3 > 48 4 55599236 55599567 ENST00000512959 > 4 > > This will give many versions of genomic coordinates. For example, KIT has > 3 sets of exons. I think these different versions may refer to different > splicing variants/isoforms. Is there a "default"/"standard" set of exons > for each gene? and how do I know which one is such one? > > Thanks > > John > > > > > ________________________________ > > To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" < > bioconductor@r-project.org> > Sent: Monday, November 25, 2013 12:41 PM > Subject: Re: [BioC] exon genomic coordinates > > > Hi, > > I am trying to use4 bioMart to retrieve the exon coordinates using the > example provided below: > > library(biomaRt) > ensembl = useMart("ensembl") > > ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > > getBM(attributes = > c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > The above works fine. However, when I tried to add "hgnc_symbol" to the > attributes list, it gave me error: > > getBM(attributes = > > c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end ","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > > Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", > "exon_chrom_start", : > Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple > attribute pages are not allowed > > But if I keep "hgnc_symbol" in the atributes list and > remove "exon_chrom_start" and "exon_chrom_end", then it worked again: > getBM(attributes = > c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > Can anyone tell me why is that? > > Thanks > > John > > > ________________________________ > From: Hans-Rudolf Hotz <hrh@fmi.ch> > > onductor@r-project.org> > Sent: Thursday, November 21, 2013 5:38 AM > Subject: Re: [BioC] exon genomic coordinates > > > Hi John > > You can use the BioMart database, which you can access with the biomaRt > package to get all exons for all transcripts for a given giene, eg: > > library(biomaRt) > ensembl = useMart("ensembl") > #assuming you are interested in mouse > mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) > > getBM(attributes = > > c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exo n_id","ensembl_transcript_id","ensembl_gene_id"), > filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) > > > Hope this helps > > Hans-Rudolf > > > On 11/21/2013 09:14 AM, array chip wrote: > > Hi, > > > > > > Can anyone suggest how to retrieve the genomic coordinates for all exons > for a given gene by say gene symbol? For example, how to retrieve the > coordinates for all 21 exons for gene KIT? > > > > Thanks > > > > John > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks Michael. How do I restrict to consensus CDS? John ________________________________ From: Michael Lawrence <lawrence.michael@gene.com> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, November 25, 2013 1:55 PM Subject: Re: [BioC] exon genomic coordinates Well, one approach is to take the longest one. That's what UCSC uses to call its "canonical transcripts". And restrict to the consensus CDS (CCDS). Hi all, have another questions about exon genomic coordinates: > > >library(biomaRt) >ensembl = useMart("ensembl") > >ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > >> getBM(attributes = >c("external_gene_id","chromosome_name","exon_chrom_start","exon_chrom _end","ensembl_transcript_id","rank"),�filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > >� �chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id rank >1 � � � � � � � �4 � � � � 55524085 � � � 55524248 � � � ENST00000412167 � �1 >2 � � � � � � � �4 � � � � 55561678 � � � 55561947 � � � ENST00000412167 � �2 >3 � � � � � � � �4 � � � � 55564450 � � � 55564731 � � � ENST00000412167 � �3 >4 � � � � � � � �4 � � � � 55565796 � � � 55565932 � � � ENST00000412167 � �4 >5 � � � � � � � �4 � � � � 55569890 � � � 55570058 � � � ENST00000412167 � �5 >6 � � � � � � � �4 � � � � 55573264 � � � 55573453 � � � ENST00000412167 � �6 >7 � � � � � � � �4 � � � � 55575590 � � � 55575705 � � � ENST00000412167 � �7 >8 � � � � � � � �4 � � � � 55589750 � � � 55589864 � � � ENST00000412167 � �8 >9 � � � � � � � �4 � � � � 55592023 � � � 55592204 � � � ENST00000412167 � �9 >10 � � � � � � � 4 � � � � 55593384 � � � 55593490 � � � ENST00000412167 � 10 >11 � � � � � � � 4 � � � � 55593582 � � � 55593708 � � � ENST00000412167 � 11 >12 � � � � � � � 4 � � � � 55593989 � � � 55594093 � � � ENST00000412167 � 12 >13 � � � � � � � 4 � � � � 55594177 � � � 55594287 � � � ENST00000412167 � 13 >14 � � � � � � � 4 � � � � 55595501 � � � 55595651 � � � ENST00000412167 � 14 >15 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000412167 � 15 >16 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000412167 � 16 >17 � � � � � � � 4 � � � � 55599236 � � � 55599358 � � � ENST00000412167 � 17 >18 � � � � � � � 4 � � � � 55602664 � � � 55602775 � � � ENST00000412167 � 18 >19 � � � � � � � 4 � � � � 55602887 � � � 55602986 � � � ENST00000412167 � 19 >20 � � � � � � � 4 � � � � 55603341 � � � 55603446 � � � ENST00000412167 � 20 >21 � � � � � � � 4 � � � � 55604595 � � � 55605177 � � � ENST00000412167 � 21 >22 � � � � � � � 4 � � � � 55524085 � � � 55524248 � � � ENST00000288135 � �1 >23 � � � � � � � 4 � � � � 55561678 � � � 55561947 � � � ENST00000288135 � �2 >24 � � � � � � � 4 � � � � 55564450 � � � 55564731 � � � ENST00000288135 � �3 >25 � � � � � � � 4 � � � � 55565796 � � � 55565932 � � � ENST00000288135 � �4 >26 � � � � � � � 4 � � � � 55569890 � � � 55570058 � � � ENST00000288135 � �5 >27 � � � � � � � 4 � � � � 55573264 � � � 55573453 � � � ENST00000288135 � �6 >28 � � � � � � � 4 � � � � 55575590 � � � 55575705 � � � ENST00000288135 � �7 >29 � � � � � � � 4 � � � � 55589750 � � � 55589864 � � � ENST00000288135 � �8 >30 � � � � � � � 4 � � � � 55592023 � � � 55592216 � � � ENST00000288135 � �9 >31 � � � � � � � 4 � � � � 55593384 � � � 55593490 � � � ENST00000288135 � 10 >32 � � � � � � � 4 � � � � 55593582 � � � 55593708 � � � ENST00000288135 � 11 >33 � � � � � � � 4 � � � � 55593989 � � � 55594093 � � � ENST00000288135 � 12 >34 � � � � � � � 4 � � � � 55594177 � � � 55594287 � � � ENST00000288135 � 13 >35 � � � � � � � 4 � � � � 55595501 � � � 55595651 � � � ENST00000288135 � 14 >36 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000288135 � 15 >37 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000288135 � 16 >38 � � � � � � � 4 � � � � 55599236 � � � 55599358 � � � ENST00000288135 � 17 >39 � � � � � � � 4 � � � � 55602664 � � � 55602775 � � � ENST00000288135 � 18 >40 � � � � � � � 4 � � � � 55602887 � � � 55602986 � � � ENST00000288135 � 19 >41 � � � � � � � 4 � � � � 55603341 � � � 55603446 � � � ENST00000288135 � 20 >42 � � � � � � � 4 � � � � 55604595 � � � 55606881 � � � ENST00000288135 � 21 >43 � � � � � � � 4 � � � � 55524106 � � � 55524248 � � � ENST00000514582 � �1 >44 � � � � � � � 4 � � � � 55561678 � � � 55562072 � � � ENST00000514582 � �2 >45 � � � � � � � 4 � � � � 55595458 � � � 55595651 � � � ENST00000512959 � �1 >46 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000512959 � �2 >47 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000512959 � �3 >48 � � � � � � � 4 � � � � 55599236 � � � 55599567 � � � ENST00000512959 � �4 > >This will give many versions of genomic coordinates. For example, KIT has 3 sets of exons. I think these different versions may refer to different splicing variants/isoforms. Is there a "default"/"standard" set of exons for each gene? and how do I know which one is such one? > >Thanks > >John > > > > >________________________________ > >To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" <bioconductor@r-project.org> >Sent: Monday, November 25, 2013 12:41 PM > >Subject: Re: [BioC] exon genomic coordinates > > >Hi, > >I am trying to use4 bioMart to retrieve the exon coordinates using the example provided below: > >library(biomaRt) >ensembl = useMart("ensembl") > >ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > >getBM(attributes = >c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), >filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > >The above works fine. However, when I tried to add "hgnc_symbol" to the attributes list, it gave me error: > >getBM(attributes = >c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end" ,"rank"), >filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > >Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", "exon_chrom_start", �:� >� Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed > >But if I keep "hgnc_symbol" in the atributes list and remove�"exon_chrom_start" and "exon_chrom_end", then it worked again: >getBM(attributes = >c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), >filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > >Can anyone tell me why is that? > >Thanks > >John > > >________________________________ >From: Hans-Rudolf Hotz <hrh@fmi.ch> > >onductor@r-project.org> >Sent: Thursday, November 21, 2013 5:38 AM >Subject: Re: [BioC] exon genomic coordinates > > >Hi John > >You can use the BioMart database, which you can access with the biomaRt >package to get all exons for all transcripts for a given giene, eg: > >library(biomaRt) >ensembl = useMart("ensembl") >#assuming you are interested in mouse >mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) > >getBM(attributes = >c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exon _id","ensembl_transcript_id","ensembl_gene_id"), >filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) > > >Hope this helps > >Hans-Rudolf > > >On 11/21/2013 09:14 AM, array chip wrote: >> Hi, >> >> >> Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? >> >> Thanks >> >> John >> �� [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > >> >�� [[alternative HTML version deleted]] > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >� � � � [[alternative HTML version deleted]] > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

You could use rtracklayer to grab the CCDS track from UCSC. Might be some way with Biomart from Ensembl. On Mon, Nov 25, 2013 at 2:06 PM, array chip <arrayprofile@yahoo.com> wrote: > Thanks Michael. How do I restrict to consensus CDS? > > John > > ------------------------------ > *From:* Michael Lawrence <lawrence.michael@gene.com> > *To:* array chip <arrayprofile@yahoo.com> > *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> > *Sent:* Monday, November 25, 2013 1:55 PM > > *Subject:* Re: [BioC] exon genomic coordinates > > Well, one approach is to take the longest one. That's what UCSC uses to > call its "canonical transcripts". And restrict to the consensus CDS (CCDS). > > > On Mon, Nov 25, 2013 at 1:12 PM, array chip <arrayprofile@yahoo.com>wrote: > > Hi all, have another questions about exon genomic coordinates: > > library(biomaRt) > ensembl = useMart("ensembl") > > ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > > > getBM(attributes = > c("external_gene_id","chromosome_name","exon_chrom_start","exon_chro m_end","ensembl_transcript_id","rank"), filters > = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id > rank > 1 4 55524085 55524248 ENST00000412167 > 1 > 2 4 55561678 55561947 ENST00000412167 > 2 > 3 4 55564450 55564731 ENST00000412167 > 3 > 4 4 55565796 55565932 ENST00000412167 > 4 > 5 4 55569890 55570058 ENST00000412167 > 5 > 6 4 55573264 55573453 ENST00000412167 > 6 > 7 4 55575590 55575705 ENST00000412167 > 7 > 8 4 55589750 55589864 ENST00000412167 > 8 > 9 4 55592023 55592204 ENST00000412167 > 9 > 10 4 55593384 55593490 ENST00000412167 > 10 > 11 4 55593582 55593708 ENST00000412167 > 11 > 12 4 55593989 55594093 ENST00000412167 > 12 > 13 4 55594177 55594287 ENST00000412167 > 13 > 14 4 55595501 55595651 ENST00000412167 > 14 > 15 4 55597494 55597585 ENST00000412167 > 15 > 16 4 55598037 55598164 ENST00000412167 > 16 > 17 4 55599236 55599358 ENST00000412167 > 17 > 18 4 55602664 55602775 ENST00000412167 > 18 > 19 4 55602887 55602986 ENST00000412167 > 19 > 20 4 55603341 55603446 ENST00000412167 > 20 > 21 4 55604595 55605177 ENST00000412167 > 21 > 22 4 55524085 55524248 ENST00000288135 > 1 > 23 4 55561678 55561947 ENST00000288135 > 2 > 24 4 55564450 55564731 ENST00000288135 > 3 > 25 4 55565796 55565932 ENST00000288135 > 4 > 26 4 55569890 55570058 ENST00000288135 > 5 > 27 4 55573264 55573453 ENST00000288135 > 6 > 28 4 55575590 55575705 ENST00000288135 > 7 > 29 4 55589750 55589864 ENST00000288135 > 8 > 30 4 55592023 55592216 ENST00000288135 > 9 > 31 4 55593384 55593490 ENST00000288135 > 10 > 32 4 55593582 55593708 ENST00000288135 > 11 > 33 4 55593989 55594093 ENST00000288135 > 12 > 34 4 55594177 55594287 ENST00000288135 > 13 > 35 4 55595501 55595651 ENST00000288135 > 14 > 36 4 55597494 55597585 ENST00000288135 > 15 > 37 4 55598037 55598164 ENST00000288135 > 16 > 38 4 55599236 55599358 ENST00000288135 > 17 > 39 4 55602664 55602775 ENST00000288135 > 18 > 40 4 55602887 55602986 ENST00000288135 > 19 > 41 4 55603341 55603446 ENST00000288135 > 20 > 42 4 55604595 55606881 ENST00000288135 > 21 > 43 4 55524106 55524248 ENST00000514582 > 1 > 44 4 55561678 55562072 ENST00000514582 > 2 > 45 4 55595458 55595651 ENST00000512959 > 1 > 46 4 55597494 55597585 ENST00000512959 > 2 > 47 4 55598037 55598164 ENST00000512959 > 3 > 48 4 55599236 55599567 ENST00000512959 > 4 > > This will give many versions of genomic coordinates. For example, KIT has > 3 sets of exons. I think these different versions may refer to different > splicing variants/isoforms. Is there a "default"/"standard" set of exons > for each gene? and how do I know which one is such one? > > Thanks > > John > > > > > ________________________________ > > To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" < > bioconductor@r-project.org> > Sent: Monday, November 25, 2013 12:41 PM > Subject: Re: [BioC] exon genomic coordinates > > > Hi, > > I am trying to use4 bioMart to retrieve the exon coordinates using the > example provided below: > > library(biomaRt) > ensembl = useMart("ensembl") > > ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) > > getBM(attributes = > c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > The above works fine. However, when I tried to add "hgnc_symbol" to the > attributes list, it gave me error: > > getBM(attributes = > > c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end ","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > > Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", > "exon_chrom_start", : > Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple > attribute pages are not allowed > > But if I keep "hgnc_symbol" in the atributes list and > remove "exon_chrom_start" and "exon_chrom_end", then it worked again: > getBM(attributes = > c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), > filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) > > Can anyone tell me why is that? > > Thanks > > John > > > ________________________________ > From: Hans-Rudolf Hotz <hrh@fmi.ch> > > onductor@r-project.org> > Sent: Thursday, November 21, 2013 5:38 AM > Subject: Re: [BioC] exon genomic coordinates > > > Hi John > > You can use the BioMart database, which you can access with the biomaRt > package to get all exons for all transcripts for a given giene, eg: > > library(biomaRt) > ensembl = useMart("ensembl") > #assuming you are interested in mouse > mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) > > getBM(attributes = > > c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exo n_id","ensembl_transcript_id","ensembl_gene_id"), > filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) > > > Hope this helps > > Hans-Rudolf > > > On 11/21/2013 09:14 AM, array chip wrote: > > Hi, > > > > > > Can anyone suggest how to retrieve the genomic coordinates for all exons > for a given gene by say gene symbol? For example, how to retrieve the > coordinates for all 21 exons for gene KIT? > > > > Thanks > > > > John > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thansk a lot Michael ________________________________ From: Michael Lawrence <lawrence.michael@gene.com> Cc: Michael Lawrence <lawrence.michael@gene.com>; "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, November 25, 2013 2:19 PM Subject: Re: [BioC] exon genomic coordinates You could use rtracklayer to grab the CCDS track from UCSC. Might be some way with Biomart from Ensembl. Thanks Michael. How do I restrict to consensus CDS? > > >John > > > >________________________________ > >From: Michael Lawrence <lawrence.michael@gene.com> >Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> >Sent: Monday, November 25, 2013 1:55 PM > >Subject: Re: [BioC] exon genomic coordinates > > > >Well, one approach is to take the longest one. That's what UCSC uses to call its "canonical transcripts". And restrict to the consensus CDS (CCDS). > > > > > >Hi all, have another questions about exon genomic coordinates: >> >> >>library(biomaRt) >>ensembl = useMart("ensembl") >> >>ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >>> getBM(attributes = >>c("external_gene_id","chromosome_name","exon_chrom_start","exon_chro m_end","ensembl_transcript_id","rank"),�filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >>� �chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id rank >>1 � � � � � � � �4 � � � � 55524085 � � � 55524248 � � � ENST00000412167 � �1 >>2 � � � � � � � �4 � � � � 55561678 � � � 55561947 � � � ENST00000412167 � �2 >>3 � � � � � � � �4 � � � � 55564450 � � � 55564731 � � � ENST00000412167 � �3 >>4 � � � � � � � �4 � � � � 55565796 � � � 55565932 � � � ENST00000412167 � �4 >>5 � � � � � � � �4 � � � � 55569890 � � � 55570058 � � � ENST00000412167 � �5 >>6 � � � � � � � �4 � � � � 55573264 � � � 55573453 � � � ENST00000412167 � �6 >>7 � � � � � � � �4 � � � � 55575590 � � � 55575705 � � � ENST00000412167 � �7 >>8 � � � � � � � �4 � � � � 55589750 � � � 55589864 � � � ENST00000412167 � �8 >>9 � � � � � � � �4 � � � � 55592023 � � � 55592204 � � � ENST00000412167 � �9 >>10 � � � � � � � 4 � � � � 55593384 � � � 55593490 � � � ENST00000412167 � 10 >>11 � � � � � � � 4 � � � � 55593582 � � � 55593708 � � � ENST00000412167 � 11 >>12 � � � � � � � 4 � � � � 55593989 � � � 55594093 � � � ENST00000412167 � 12 >>13 � � � � � � � 4 � � � � 55594177 � � � 55594287 � � � ENST00000412167 � 13 >>14 � � � � � � � 4 � � � � 55595501 � � � 55595651 � � � ENST00000412167 � 14 >>15 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000412167 � 15 >>16 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000412167 � 16 >>17 � � � � � � � 4 � � � � 55599236 � � � 55599358 � � � ENST00000412167 � 17 >>18 � � � � � � � 4 � � � � 55602664 � � � 55602775 � � � ENST00000412167 � 18 >>19 � � � � � � � 4 � � � � 55602887 � � � 55602986 � � � ENST00000412167 � 19 >>20 � � � � � � � 4 � � � � 55603341 � � � 55603446 � � � ENST00000412167 � 20 >>21 � � � � � � � 4 � � � � 55604595 � � � 55605177 � � � ENST00000412167 � 21 >>22 � � � � � � � 4 � � � � 55524085 � � � 55524248 � � � ENST00000288135 � �1 >>23 � � � � � � � 4 � � � � 55561678 � � � 55561947 � � � ENST00000288135 � �2 >>24 � � � � � � � 4 � � � � 55564450 � � � 55564731 � � � ENST00000288135 � �3 >>25 � � � � � � � 4 � � � � 55565796 � � � 55565932 � � � ENST00000288135 � �4 >>26 � � � � � � � 4 � � � � 55569890 � � � 55570058 � � � ENST00000288135 � �5 >>27 � � � � � � � 4 � � � � 55573264 � � � 55573453 � � � ENST00000288135 � �6 >>28 � � � � � � � 4 � � � � 55575590 � � � 55575705 � � � ENST00000288135 � �7 >>29 � � � � � � � 4 � � � � 55589750 � � � 55589864 � � � ENST00000288135 � �8 >>30 � � � � � � � 4 � � � � 55592023 � � � 55592216 � � � ENST00000288135 � �9 >>31 � � � � � � � 4 � � � � 55593384 � � � 55593490 � � � ENST00000288135 � 10 >>32 � � � � � � � 4 � � � � 55593582 � � � 55593708 � � � ENST00000288135 � 11 >>33 � � � � � � � 4 � � � � 55593989 � � � 55594093 � � � ENST00000288135 � 12 >>34 � � � � � � � 4 � � � � 55594177 � � � 55594287 � � � ENST00000288135 � 13 >>35 � � � � � � � 4 � � � � 55595501 � � � 55595651 � � � ENST00000288135 � 14 >>36 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000288135 � 15 >>37 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000288135 � 16 >>38 � � � � � � � 4 � � � � 55599236 � � � 55599358 � � � ENST00000288135 � 17 >>39 � � � � � � � 4 � � � � 55602664 � � � 55602775 � � � ENST00000288135 � 18 >>40 � � � � � � � 4 � � � � 55602887 � � � 55602986 � � � ENST00000288135 � 19 >>41 � � � � � � � 4 � � � � 55603341 � � � 55603446 � � � ENST00000288135 � 20 >>42 � � � � � � � 4 � � � � 55604595 � � � 55606881 � � � ENST00000288135 � 21 >>43 � � � � � � � 4 � � � � 55524106 � � � 55524248 � � � ENST00000514582 � �1 >>44 � � � � � � � 4 � � � � 55561678 � � � 55562072 � � � ENST00000514582 � �2 >>45 � � � � � � � 4 � � � � 55595458 � � � 55595651 � � � ENST00000512959 � �1 >>46 � � � � � � � 4 � � � � 55597494 � � � 55597585 � � � ENST00000512959 � �2 >>47 � � � � � � � 4 � � � � 55598037 � � � 55598164 � � � ENST00000512959 � �3 >>48 � � � � � � � 4 � � � � 55599236 � � � 55599567 � � � ENST00000512959 � �4 >> >>This will give many versions of genomic coordinates. For example, KIT has 3 sets of exons. I think these different versions may refer to different splicing variants/isoforms. Is there a "default"/"standard" set of exons for each gene? and how do I know which one is such one? >> >>Thanks >> >>John >> >> >> >> >>________________________________ >> >>To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" <bioconductor@r-project.org> >>Sent: Monday, November 25, 2013 12:41 PM >> >>Subject: Re: [BioC] exon genomic coordinates >> >> >>Hi, >> >>I am trying to use4 bioMart to retrieve the exon coordinates using the example provided below: >> >>library(biomaRt) >>ensembl = useMart("ensembl") >> >>ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >>getBM(attributes = >>c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), >>filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >>The above works fine. However, when I tried to add "hgnc_symbol" to the attributes list, it gave me error: >> >>getBM(attributes = >>c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_end ","rank"), >>filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> >>Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", "exon_chrom_start", �:� >>� Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple attribute pages are not allowed >> >>But if I keep "hgnc_symbol" in the atributes list and remove�"exon_chrom_start" and "exon_chrom_end", then it worked again: >>getBM(attributes = >>c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), >>filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >>Can anyone tell me why is that? >> >>Thanks >> >>John >> >> >>________________________________ >>From: Hans-Rudolf Hotz <hrh@fmi.ch> >> >>onductor@r-project.org> >>Sent: Thursday, November 21, 2013 5:38 AM >>Subject: Re: [BioC] exon genomic coordinates >> >> >>Hi John >> >>You can use the BioMart database, which you can access with the biomaRt >>package to get all exons for all transcripts for a given giene, eg: >> >>library(biomaRt) >>ensembl = useMart("ensembl") >>#assuming you are interested in mouse >>mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) >> >>getBM(attributes = >>c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_exo n_id","ensembl_transcript_id","ensembl_gene_id"), >>filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) >> >> >>Hope this helps >> >>Hans-Rudolf >> >> >>On 11/21/2013 09:14 AM, array chip wrote: >>> Hi, >>> >>> >>> Can anyone suggest how to retrieve the genomic coordinates for all exons for a given gene by say gene symbol? For example, how to retrieve the coordinates for all 21 exons for gene KIT? >>> >>> Thanks >>> >>> John >>> �� [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >>> >>�� [[alternative HTML version deleted]] >> >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@r-project.org >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>� � � � [[alternative HTML version deleted]] >> >> >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@r-project.org >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

0

Entering edit mode

Hi Michael, John, I was hoping maybe making a TranscriptDb object from the CCDS track would help here: library(GenomicFeatures) txdb <- makeTranscriptDbFromUCSC("hg19", "ccdsGene") but unfortunately it's not easy to link the exons in 'txdb' to Ensembl transcript or gene ids because 'txdb' lacks this information: > ex <- exons(txdb, columns=c("exon_id", "tx_name", "gene_id")) > head(ex) GRanges with 6 ranges and 3 metadata columns: seqnames ranges strand | exon_id tx_name <rle> <iranges> <rle> | <integer> <characterlist> [1] chr1 [ 69091, 70008] + | 1 CCDS30547.1 [2] chr1 [367659, 368597] + | 2 CCDS41220.1 [3] chr1 [861322, 861393] + | 3 CCDS2.2 [4] chr1 [865535, 865716] + | 4 CCDS2.2 [5] chr1 [866419, 866469] + | 5 CCDS2.2 [6] chr1 [871152, 871276] + | 6 CCDS2.2 gene_id <characterlist> [1] NA [2] NA [3] NA [4] NA [5] NA [6] NA --- seqlengths: chr1 chr2 ... chrUn_gl000249 249250621 243199373 ... 38502 Querying Ensembl directly to make a TranscriptDb object: library(GenomicFeatures) txdb <- makeTranscriptDbFromBiomart() # takes a while! (40 min. for me) # this used to be much faster, # don't know what's going on saveDb(txdb, file="hsapiens_gene_ensembl_txdb.sqlite") # save for later re-use KIT_exons <- exons(txdb, vals=list(gene_id="ENSG00000157404"), columns=c("exon_name", "tx_name", "gene_id")) tx_names <- unique(unlist(mcols(KIT_exons)$tx_name)) # tx_names # 4 transcripts # [1] "ENST00000288135" "ENST00000412167" "ENST00000514582" "ENST00000512959" ex_by_tx <- exonsBy(txdb, by="tx", use.names=TRUE) KIT_ex_by_tx <- ex_by_tx[tx_names] Transcript lengths: > sum(width(KIT_ex_by_tx)) ENST00000288135 ENST00000412167 ENST00000514582 ENST00000512959 5186 3470 538 746 Pick-up the longest: > KIT_ex_by_tx[["ENST00000288135"]] GRanges with 21 ranges and 3 metadata columns: seqnames ranges strand | exon_id exon_name <rle> <iranges> <rle> | <integer> <character> [1] 4 [55524085, 55524248] + | 156828 ENSE00001905199 [2] 4 [55561678, 55561947] + | 156830 ENSE00001032350 [3] 4 [55564450, 55564731] + | 156832 ENSE00001074448 [4] 4 [55565796, 55565932] + | 156833 ENSE00001121859 [5] 4 [55569890, 55570058] + | 156834 ENSE00001074426 ... ... ... ... ... ... ... [17] 4 [55599236, 55599358] + | 156850 ENSE00001074435 [18] 4 [55602664, 55602775] + | 156852 ENSE00001074442 [19] 4 [55602887, 55602986] + | 156853 ENSE00001224349 [20] 4 [55603341, 55603446] + | 156854 ENSE00001074415 [21] 4 [55604595, 55606881] + | 156856 ENSE00001898693 exon_rank <integer> [1] 1 [2] 2 [3] 3 [4] 4 [5] 5 ... ... [17] 17 [18] 18 [19] 19 [20] 20 [21] 21 --- seqlengths: 1 2 ... LRG_98 LRG_99 249250621 243199373 ... 18750 13294 Cheers, H. On 11/25/2013 02:19 PM, Michael Lawrence wrote: > You could use rtracklayer to grab the CCDS track from UCSC. Might be some > way with Biomart from Ensembl. > > > On Mon, Nov 25, 2013 at 2:06 PM, array chip <arrayprofile at="" yahoo.com=""> wrote: > >> Thanks Michael. How do I restrict to consensus CDS? >> >> John >> >> ------------------------------ >> *From:* Michael Lawrence <lawrence.michael at="" gene.com=""> >> *To:* array chip <arrayprofile at="" yahoo.com=""> >> *Cc:* "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> *Sent:* Monday, November 25, 2013 1:55 PM >> >> *Subject:* Re: [BioC] exon genomic coordinates >> >> Well, one approach is to take the longest one. That's what UCSC uses to >> call its "canonical transcripts". And restrict to the consensus CDS (CCDS). >> >> >> On Mon, Nov 25, 2013 at 1:12 PM, array chip <arrayprofile at="" yahoo.com="">wrote: >> >> Hi all, have another questions about exon genomic coordinates: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> >> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >>> getBM(attributes = >> c("external_gene_id","chromosome_name","exon_chrom_start","exon_chr om_end","ensembl_transcript_id","rank"), filters >> = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id >> rank >> 1 4 55524085 55524248 ENST00000412167 >> 1 >> 2 4 55561678 55561947 ENST00000412167 >> 2 >> 3 4 55564450 55564731 ENST00000412167 >> 3 >> 4 4 55565796 55565932 ENST00000412167 >> 4 >> 5 4 55569890 55570058 ENST00000412167 >> 5 >> 6 4 55573264 55573453 ENST00000412167 >> 6 >> 7 4 55575590 55575705 ENST00000412167 >> 7 >> 8 4 55589750 55589864 ENST00000412167 >> 8 >> 9 4 55592023 55592204 ENST00000412167 >> 9 >> 10 4 55593384 55593490 ENST00000412167 >> 10 >> 11 4 55593582 55593708 ENST00000412167 >> 11 >> 12 4 55593989 55594093 ENST00000412167 >> 12 >> 13 4 55594177 55594287 ENST00000412167 >> 13 >> 14 4 55595501 55595651 ENST00000412167 >> 14 >> 15 4 55597494 55597585 ENST00000412167 >> 15 >> 16 4 55598037 55598164 ENST00000412167 >> 16 >> 17 4 55599236 55599358 ENST00000412167 >> 17 >> 18 4 55602664 55602775 ENST00000412167 >> 18 >> 19 4 55602887 55602986 ENST00000412167 >> 19 >> 20 4 55603341 55603446 ENST00000412167 >> 20 >> 21 4 55604595 55605177 ENST00000412167 >> 21 >> 22 4 55524085 55524248 ENST00000288135 >> 1 >> 23 4 55561678 55561947 ENST00000288135 >> 2 >> 24 4 55564450 55564731 ENST00000288135 >> 3 >> 25 4 55565796 55565932 ENST00000288135 >> 4 >> 26 4 55569890 55570058 ENST00000288135 >> 5 >> 27 4 55573264 55573453 ENST00000288135 >> 6 >> 28 4 55575590 55575705 ENST00000288135 >> 7 >> 29 4 55589750 55589864 ENST00000288135 >> 8 >> 30 4 55592023 55592216 ENST00000288135 >> 9 >> 31 4 55593384 55593490 ENST00000288135 >> 10 >> 32 4 55593582 55593708 ENST00000288135 >> 11 >> 33 4 55593989 55594093 ENST00000288135 >> 12 >> 34 4 55594177 55594287 ENST00000288135 >> 13 >> 35 4 55595501 55595651 ENST00000288135 >> 14 >> 36 4 55597494 55597585 ENST00000288135 >> 15 >> 37 4 55598037 55598164 ENST00000288135 >> 16 >> 38 4 55599236 55599358 ENST00000288135 >> 17 >> 39 4 55602664 55602775 ENST00000288135 >> 18 >> 40 4 55602887 55602986 ENST00000288135 >> 19 >> 41 4 55603341 55603446 ENST00000288135 >> 20 >> 42 4 55604595 55606881 ENST00000288135 >> 21 >> 43 4 55524106 55524248 ENST00000514582 >> 1 >> 44 4 55561678 55562072 ENST00000514582 >> 2 >> 45 4 55595458 55595651 ENST00000512959 >> 1 >> 46 4 55597494 55597585 ENST00000512959 >> 2 >> 47 4 55598037 55598164 ENST00000512959 >> 3 >> 48 4 55599236 55599567 ENST00000512959 >> 4 >> >> This will give many versions of genomic coordinates. For example, KIT has >> 3 sets of exons. I think these different versions may refer to different >> splicing variants/isoforms. Is there a "default"/"standard" set of exons >> for each gene? and how do I know which one is such one? >> >> Thanks >> >> John >> >> >> >> >> ________________________________ >> >> To: Hans-Rudolf Hotz <hrh at="" fmi.ch="">; "bioconductor at r-project.org" < >> bioconductor at r-project.org> >> Sent: Monday, November 25, 2013 12:41 PM >> Subject: Re: [BioC] exon genomic coordinates >> >> >> Hi, >> >> I am trying to use4 bioMart to retrieve the exon coordinates using the >> example provided below: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> >> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >> getBM(attributes = >> c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> The above works fine. However, when I tried to add "hgnc_symbol" to the >> attributes list, it gave me error: >> >> getBM(attributes = >> >> c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_en d","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> >> Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", >> "exon_chrom_start", : >> Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple >> attribute pages are not allowed >> >> But if I keep "hgnc_symbol" in the atributes list and >> remove "exon_chrom_start" and "exon_chrom_end", then it worked again: >> getBM(attributes = >> c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> Can anyone tell me why is that? >> >> Thanks >> >> John >> >> >> ________________________________ >> From: Hans-Rudolf Hotz <hrh at="" fmi.ch=""> >> >> onductor at r-project.org> >> Sent: Thursday, November 21, 2013 5:38 AM >> Subject: Re: [BioC] exon genomic coordinates >> >> >> Hi John >> >> You can use the BioMart database, which you can access with the biomaRt >> package to get all exons for all transcripts for a given giene, eg: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> #assuming you are interested in mouse >> mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) >> >> getBM(attributes = >> >> c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_ex on_id","ensembl_transcript_id","ensembl_gene_id"), >> filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) >> >> >> Hope this helps >> >> Hans-Rudolf >> >> >> On 11/21/2013 09:14 AM, array chip wrote: >>> Hi, >>> >>> >>> Can anyone suggest how to retrieve the genomic coordinates for all exons >> for a given gene by say gene symbol? For example, how to retrieve the >> coordinates for all 21 exons for gene KIT? >>> >>> Thanks >>> >>> John >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >>> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.4 years ago Hervé Pagès 16k

0

Entering edit mode

Wah, thanks a lot Herve! I also encountered that linking to biomaRt by "ensembl = useMart("ensembl")"�sometimes take forever, quite frustrated. John ________________________________ From: Herv� Pag�s <hpages@fhcrc.org> To: Michael Lawrence <lawrence.michael@gene.com>; array chip <arrayprofile@yahoo.com> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Monday, November 25, 2013 3:30 PM Subject: Re: [BioC] exon genomic coordinates Hi Michael, John, I was hoping maybe making a TranscriptDb object from the CCDS track would help here: � library(GenomicFeatures) � txdb <- makeTranscriptDbFromUCSC("hg19", "ccdsGene") but unfortunately it's not easy to link the exons in 'txdb' to Ensembl transcript or gene ids because 'txdb' lacks this information: � > ex <- exons(txdb, columns=c("exon_id", "tx_name", "gene_id")) � > head(ex) � GRanges with 6 ranges and 3 metadata columns: � � � seqnames� � � � � ranges strand |� exon_id� � � � tx_name � � � � � <rle>� � � � <iranges>� <rle> | <integer> <characterlist> � [1]� � chr1 [ 69091,� 70008]� � � + |� � � � 1� � CCDS30547.1 � [2]� � chr1 [367659, 368597]� � � + |� � � � 2� � CCDS41220.1 � [3]� � chr1 [861322, 861393]� � � + |� � � � 3� � � � CCDS2.2 � [4]� � chr1 [865535, 865716]� � � + |� � � � 4� � � � CCDS2.2 � [5]� � chr1 [866419, 866469]� � � + |� � � � 5� � � � CCDS2.2 � [6]� � chr1 [871152, 871276]� � � + |� � � � 6� � � � CCDS2.2 � � � � � � � gene_id � � � <characterlist> � [1]� � � � � � � NA � [2]� � � � � � � NA � [3]� � � � � � � NA � [4]� � � � � � � NA � [5]� � � � � � � NA � [6]� � � � � � � NA � --- � seqlengths: � � � � � � � � � � chr1� � � � � � � � chr2 ...� � � chrUn_gl000249 � � � � � � � 249250621� � � � � � 243199373 ...� � � � � � � � 38502 Querying Ensembl directly to make a TranscriptDb object: � library(GenomicFeatures) � txdb <- makeTranscriptDbFromBiomart()� # takes a while! (40 min. for me) � � � � � � � � � � � � � � � � � � � � � # this used to be much faster, � � � � � � � � � � � � � � � � � � � � � # don't know what's going on � saveDb(txdb, file="hsapiens_gene_ensembl_txdb.sqlite")� # save for later re-use � KIT_exons <- exons(txdb, vals=list(gene_id="ENSG00000157404"), columns=c("exon_name", "tx_name", "gene_id")) � tx_names <- unique(unlist(mcols(KIT_exons)$tx_name)) � # tx_names� # 4 transcripts � # [1] "ENST00000288135" "ENST00000412167" "ENST00000514582" "ENST00000512959" � ex_by_tx <- exonsBy(txdb, by="tx", use.names=TRUE) � KIT_ex_by_tx <- ex_by_tx[tx_names] Transcript lengths: � > sum(width(KIT_ex_by_tx)) � ENST00000288135 ENST00000412167 ENST00000514582 ENST00000512959 � � � � � � � 5186� � � � � � 3470� � � � � � 538� � � � � � 746 Pick-up the longest: > KIT_ex_by_tx[["ENST00000288135"]] GRanges with 21 ranges and 3 metadata columns: � � � � seqnames� � � � � � � ranges strand� |� exon_id� � � exon_name � � � � � <rle>� � � � � � <iranges>� <rle>� | <integer>� � <character> � � [1]� � � � 4 [55524085, 55524248]� � � +� |� � 156828 ENSE00001905199 � � [2]� � � � 4 [55561678, 55561947]� � � +� |� � 156830 ENSE00001032350 � � [3]� � � � 4 [55564450, 55564731]� � � +� |� � 156832 ENSE00001074448 � � [4]� � � � 4 [55565796, 55565932]� � � +� |� � 156833 ENSE00001121859 � � [5]� � � � 4 [55569890, 55570058]� � � +� |� � 156834 ENSE00001074426 � � ...� � � ...� � � � � � � � � ...� � ... ...� � � ...� � � � � � ... � [17]� � � � 4 [55599236, 55599358]� � � +� |� � 156850 ENSE00001074435 � [18]� � � � 4 [55602664, 55602775]� � � +� |� � 156852 ENSE00001074442 � [19]� � � � 4 [55602887, 55602986]� � � +� |� � 156853 ENSE00001224349 � [20]� � � � 4 [55603341, 55603446]� � � +� |� � 156854 ENSE00001074415 � [21]� � � � 4 [55604595, 55606881]� � � +� |� � 156856 ENSE00001898693 � � � � exon_rank � � � � <integer> � � [1]� � � � 1 � � [2]� � � � 2 � � [3]� � � � 3 � � [4]� � � � 4 � � [5]� � � � 5 � � ...� � � ... � [17]� � � � 17 � [18]� � � � 18 � [19]� � � � 19 � [20]� � � � 20 � [21]� � � � 21 � --- � seqlengths: � � � � � � � � � � 1� � � � � � � � 2 ...� � � � � � LRG_98 � LRG_99 � � � � � � 249250621� � � � 243199373 ...� � � � � � 18750 � 13294 Cheers, H. On 11/25/2013 02:19 PM, Michael Lawrence wrote: > You could use rtracklayer to grab the CCDS track from UCSC. Might be some > way with Biomart from Ensembl. > > e: > >> Thanks Michael. How do I restrict to consensus CDS? >> >> John >> >>� � ------------------------------ >>� *From:* Michael Lawrence <lawrence.michael@gene.com> >> *Cc:* "bioconductor@r-project.org" <bioconductor@r-project.org> >> *Sent:* Monday, November 25, 2013 1:55 PM >> >> *Subject:* Re: [BioC] exon genomic coordinates >> >> Well, one approach is to take the longest one. That's what UCSC uses to >> call its "canonical transcripts". And restrict to the consensus CDS (CCDS). >> >> e: >> >> Hi all, have another questions about exon genomic coordinates: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> >> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >>> getBM(attributes = >> c("external_gene_id","chromosome_name","exon_chrom_start","exon_chr om_end","ensembl_transcript_id","rank"), filters >> = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >>� � chromosome_name exon_chrom_start exon_chrom_end ensembl_transcript_id >> rank >> 1� � � � � � � � 4� � � � 55524085� � � 55524248� � � ENST00000412167 >>� 1 >> 2� � � � � � � � 4� � � � 55561678� � � 55561947� � � ENST00000412167 >>� 2 >> 3� � � � � � � � 4� � � � 55564450� � � 55564731� � � ENST00000412167 >>� 3 >> 4� � � � � � � � 4� � � � 55565796� � � 55565932� � � ENST00000412167 >>� 4 >> 5� � � � � � � � 4� � � � 55569890� � � 55570058� � � ENST00000412167 >>� 5 >> 6� � � � � � � � 4� � � � 55573264� � � 55573453� � � ENST00000412167 >>� 6 >> 7� � � � � � � � 4� � � � 55575590� � � 55575705� � � ENST00000412167 >>� 7 >> 8� � � � � � � � 4� � � � 55589750� � � 55589864� � � ENST00000412167 >>� 8 >> 9� � � � � � � � 4� � � � 55592023� � � 55592204� � � ENST00000412167 >>� 9 >> 10� � � � � � � 4� � � � 55593384� � � 55593490� � � ENST00000412167 >> 10 >> 11� � � � � � � 4� � � � 55593582� � � 55593708� � � ENST00000412167 >> 11 >> 12� � � � � � � 4� � � � 55593989� � � 55594093� � � ENST00000412167 >> 12 >> 13� � � � � � � 4� � � � 55594177� � � 55594287� � � ENST00000412167 >> 13 >> 14� � � � � � � 4� � � � 55595501� � � 55595651� � � ENST00000412167 >> 14 >> 15� � � � � � � 4� � � � 55597494� � � 55597585� � � ENST00000412167 >> 15 >> 16� � � � � � � 4� � � � 55598037� � � 55598164� � � ENST00000412167 >> 16 >> 17� � � � � � � 4� � � � 55599236� � � 55599358� � � ENST00000412167 >> 17 >> 18� � � � � � � 4� � � � 55602664� � � 55602775� � � ENST00000412167 >> 18 >> 19� � � � � � � 4� � � � 55602887� � � 55602986� � � ENST00000412167 >> 19 >> 20� � � � � � � 4� � � � 55603341� � � 55603446� � � ENST00000412167 >> 20 >> 21� � � � � � � 4� � � � 55604595� � � 55605177� � � ENST00000412167 >> 21 >> 22� � � � � � � 4� � � � 55524085� � � 55524248� � � ENST00000288135 >>� 1 >> 23� � � � � � � 4� � � � 55561678� � � 55561947� � � ENST00000288135 >>� 2 >> 24� � � � � � � 4� � � � 55564450� � � 55564731� � � ENST00000288135 >>� 3 >> 25� � � � � � � 4� � � � 55565796� � � 55565932� � � ENST00000288135 >>� 4 >> 26� � � � � � � 4� � � � 55569890� � � 55570058� � � ENST00000288135 >>� 5 >> 27� � � � � � � 4� � � � 55573264� � � 55573453� � � ENST00000288135 >>� 6 >> 28� � � � � � � 4� � � � 55575590� � � 55575705� � � ENST00000288135 >>� 7 >> 29� � � � � � � 4� � � � 55589750� � � 55589864� � � ENST00000288135 >>� 8 >> 30� � � � � � � 4� � � � 55592023� � � 55592216� � � ENST00000288135 >>� 9 >> 31� � � � � � � 4� � � � 55593384� � � 55593490� � � ENST00000288135 >> 10 >> 32� � � � � � � 4� � � � 55593582� � � 55593708� � � ENST00000288135 >> 11 >> 33� � � � � � � 4� � � � 55593989� � � 55594093� � � ENST00000288135 >> 12 >> 34� � � � � � � 4� � � � 55594177� � � 55594287� � � ENST00000288135 >> 13 >> 35� � � � � � � 4� � � � 55595501� � � 55595651� � � ENST00000288135 >> 14 >> 36� � � � � � � 4� � � � 55597494� � � 55597585� � � ENST00000288135 >> 15 >> 37� � � � � � � 4� � � � 55598037� � � 55598164� � � ENST00000288135 >> 16 >> 38� � � � � � � 4� � � � 55599236� � � 55599358� � � ENST00000288135 >> 17 >> 39� � � � � � � 4� � � � 55602664� � � 55602775� � � ENST00000288135 >> 18 >> 40� � � � � � � 4� � � � 55602887� � � 55602986� � � ENST00000288135 >> 19 >> 41� � � � � � � 4� � � � 55603341� � � 55603446� � � ENST00000288135 >> 20 >> 42� � � � � � � 4� � � � 55604595� � � 55606881� � � ENST00000288135 >> 21 >> 43� � � � � � � 4� � � � 55524106� � � 55524248� � � ENST00000514582 >>� 1 >> 44� � � � � � � 4� � � � 55561678� � � 55562072� � � ENST00000514582 >>� 2 >> 45� � � � � � � 4� � � � 55595458� � � 55595651� � � ENST00000512959 >>� 1 >> 46� � � � � � � 4� � � � 55597494� � � 55597585� � � ENST00000512959 >>� 2 >> 47� � � � � � � 4� � � � 55598037� � � 55598164� � � ENST00000512959 >>� 3 >> 48� � � � � � � 4� � � � 55599236� � � 55599567� � � ENST00000512959 >>� 4 >> >> This will give many versions of genomic coordinates. For example, KIT has >> 3 sets of exons. I think these different versions may refer to different >> splicing variants/isoforms. Is there a "default"/"standard" set of exons >> for each gene? and how do I know which one is such one? >> >> Thanks >> >> John >> >> >> >> >> ________________________________ >> >> To: Hans-Rudolf Hotz <hrh@fmi.ch>; "bioconductor@r-project.org" < >> bioconductor@r-project.org> >> Sent: Monday, November 25, 2013 12:41 PM >> Subject: Re: [BioC] exon genomic coordinates >> >> >> Hi, >> >> I am trying to use4 bioMart to retrieve the exon coordinates using the >> example provided below: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> >> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl) >> >> getBM(attributes = >> c("chromosome_name","exon_chrom_start","exon_chrom_end","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> The above works fine. However, when I tried to add "hgnc_symbol" to the >> attributes list, it gave me error: >> >> getBM(attributes = >> >> c("hgnc_symbol","chromosome_name","exon_chrom_start","exon_chrom_en d","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> >> Error in getBM(attributes = c("hgnc_symbol", "chromosome_name", >> "exon_chrom_start",� : >>� � Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple >> attribute pages are not allowed >> >> But if I keep "hgnc_symbol" in the atributes list and >> remove "exon_chrom_start" and "exon_chrom_end", then it worked again: >> getBM(attributes = >> c("hgnc_symbol","chromosome_name","ensembl_transcript_id","rank"), >> filters = 'hgnc_symbol', values=c("KIT"),mart=ensembl) >> >> Can anyone tell me why is that? >> >> Thanks >> >> John >> >> >> ________________________________ >> From: Hans-Rudolf Hotz <hrh@fmi.ch> >> >> onductor@r-project.org> >> Sent: Thursday, November 21, 2013 5:38 AM >> Subject: Re: [BioC] exon genomic coordinates >> >> >> Hi John >> >> You can use the BioMart database, which you can access with the biomaRt >> package to get all exons for all transcripts for a given giene, eg: >> >> library(biomaRt) >> ensembl = useMart("ensembl") >> #assuming you are interested in mouse >> mouse.ensembl = useDataset("mmusculus_gene_ensembl",mart=ensembl) >> >> getBM(attributes = >> >> c("chromosome_name","exon_chrom_start","exon_chrom_end","ensembl_ex on_id","ensembl_transcript_id","ensembl_gene_id"), >> filters = 'mgi_symbol', values=c("KIT"),mart=mouse.ensembl) >> >> >> Hope this helps >> >> Hans-Rudolf >> >> >> On 11/21/2013 09:14 AM, array chip wrote: >>> Hi, >>> >>> >>> Can anyone suggest how to retrieve the genomic coordinates for all exons >> for a given gene by say gene symbol? For example, how to retrieve the >> coordinates for all 21 exons for gene KIT? >>> >>> Thanks >>> >>> John >>>� � � [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >>> >>� � � [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>� � � � � [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> >> > > �� [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages@fhcrc.org Phone:� (206) 667-5791 Fax:� � (206) 667-1319 [[alternative HTML version deleted]]

ADD REPLY • link 11.4 years ago array chip ▴ 420

Login before adding your answer.