Extract three types of intergenic regions
3
0
Entering edit mode
vinod.acear ▴ 50
@vinodacear-8884
Last seen 4.3 years ago
India

Hi  is there any package to extract three types of intergenic regions (i.e. Tendem, convergent,divergent ) in granges object from genomic and GFF file.

granges views annotation iranges • 3.9k views
ADD COMMENT
1
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 8 months ago
United States

The genFeatures() function in systemPipeR allows you to compute intergenic regions. The vignette for the Ribo-Seq workflow of systemPipeR package gives some examples (and/or consult ?genFeatures). The sub-classification of the intergenic regions you mention could be obtained downstream by using the plus/minus strand orientation of the genes defining the intergenic regions. The naming scheme of the intergenics by their neighboring genes (e.g. geneID1__geneID2) could help here but this would require some additional coding for you. If this is a common used case then I am happy to add this functionality to the to-do list for next update. Alternatively, one could discuss whether the utility to extract intergenic regions from TxDbs could become part of the GenomicFeatures package. In the past there were some discussions about this I believe.

Thomas 

ADD COMMENT
0
Entering edit mode

Hi Thomas, Thanks for your suggestion. when i tried to  install above metioned package it is giving following warning and library is not loaded.

> source("http://bioconductor.org/biocLite.R") # Sources the biocLite.R installation script 
biocLite("systemPipeR") # Installs systemPipeR from Bioconductor 

installing to /home/vinod/R/x86_64-pc-linux-gnu-library/3.2/spatial/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (spatial)

The downloaded source packages are in
	‘/tmp/RtmpcmPuuP/downloaded_packages’Warning messages:
1: In download.file(url, destfile, method, mode = "wb", ...) :
  download had nonzero exit status
2: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘BBmisc’ had non-zero exit status
3: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘fail’ had non-zero exit status
4: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘BatchJobs’ had non-zero exit status
5: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘systemPipeR’ had non-zero exit status
> library("systemPipeR") # Loads the package
Error in library("systemPipeR") : 
  there is no package called ‘systemPipeR’
ADD REPLY
0
Entering edit mode

HI Thomas somehow i installed package  ‘systemPipeR’ but genFeatures() is not available.  I am also sending you session info

 

> library("systemPipeR")
Loading required package: Rsamtools
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: DBI

> library("Rsamtools")
> library("ShortRead")
> library("BiocParallel")
> library("GenomicAlignments")
> library("DBI")
> library("systemPipeR")
> ?genFeatures
No documentation for ‘genFeatures’ in specified packages and libraries:
you could try ‘??genFeatures’
 
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8       
 [4] LC_COLLATE=en_IN.UTF-8     LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8   
 [7] LC_PAPER=en_IN.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] systemPipeR_1.2.23      RSQLite_1.0.0           DBI_0.3.1               ShortRead_1.26.0       
 [5] GenomicAlignments_1.4.2 BiocParallel_1.2.22     Rsamtools_1.20.5        BSgenome_1.36.3        
 [9] rtracklayer_1.28.10     Biostrings_2.36.4       XVector_0.8.0           GenomicRanges_1.20.8   
[13] GenomeInfoDb_1.4.3      IRanges_2.2.9           S4Vectors_0.6.6         BiocGenerics_0.14.0    

loaded via a namespace (and not attached):
 [1] genefilter_1.50.0      reshape2_1.4.1         splines_3.2.2          lattice_0.20-33       
 [5] colorspace_1.2-6       base64enc_0.1-3        Category_2.34.2        XML_3.98-1.3          
 [9] RBGL_1.44.0            survival_2.38-3        GOstats_2.34.0         RColorBrewer_1.1-2    
[13] lambda.r_1.1.7         plyr_1.8.3             stringr_1.0.0          zlibbioc_1.14.0       
[17] munsell_0.4.2          gtable_0.1.2           futile.logger_1.4.1    hwriter_1.3.2         
[21] latticeExtra_0.6-26    Biobase_2.28.0         AnnotationDbi_1.30.1   GSEABase_1.30.2       
[25] proto_0.3-10           Rcpp_0.12.1            xtable_1.7-4           edgeR_3.10.5          
[29] scales_0.3.0           checkmate_1.6.3        limma_3.24.15          graph_1.46.0          
[33] annotate_1.46.1        sendmailR_1.2-1        brew_1.0-6             BatchJobs_1.6         
[37] fail_1.3               rjson_0.2.15           ggplot2_1.0.1          digest_0.6.8          
[41] stringi_1.0-1          BBmisc_1.9             grid_3.2.2             tools_3.2.2           
[45] bitops_1.0-6           magrittr_1.5           RCurl_1.95-4.7         futile.options_1.0.0  
[49] GO.db_3.1.2            MASS_7.3-44            pheatmap_1.0.7         Matrix_1.2-2          
[53] AnnotationForge_1.10.1

 

ADD REPLY
0
Entering edit mode
You are running an old version of Bioc. You need to upgrade to Bioc 3.2. Thomas On Sat, Oct 24, 2015 at 6:06 AM vinod.acear [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User vinod.acear <https: support.bioconductor.org="" u="" 8884=""/> wrote Comment: > Extract three types of intergenic regions > <https: support.bioconductor.org="" p="" 73648="" #73766="">: > > HI Thomas somehow i installed package ‘systemPipeR’ but genFeatures() is > not available. I am also sending you session info > > > > > > library("systemPipeR") > Loading required package: Rsamtools > Loading required package: ShortRead > Loading required package: BiocParallel > Loading required package: GenomicAlignments > Loading required package: DBI > > > library("Rsamtools") > > library("ShortRead") > > library("BiocParallel") > > library("GenomicAlignments") > > library("DBI") > > library("systemPipeR") > > ?genFeatures > No documentation for ‘genFeatures’ in specified packages and libraries: > you could try ‘??genFeatures’ > > > > > > sessionInfo() > R version 3.2.2 (2015-08-14) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu precise (12.04.5 LTS) > > locale: > [1] LC_CTYPE=en_IN.UTF-8 LC_NUMERIC=C LC_TIME=en_IN.UTF-8 > [4] LC_COLLATE=en_IN.UTF-8 LC_MONETARY=en_IN.UTF-8 LC_MESSAGES=en_IN.UTF-8 > [7] LC_PAPER=en_IN.UTF-8 LC_NAME=C LC_ADDRESS=C > [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats4 parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] systemPipeR_1.2.23 RSQLite_1.0.0 DBI_0.3.1 ShortRead_1.26.0 > [5] GenomicAlignments_1.4.2 BiocParallel_1.2.22 Rsamtools_1.20.5 BSgenome_1.36.3 > [9] rtracklayer_1.28.10 Biostrings_2.36.4 XVector_0.8.0 GenomicRanges_1.20.8 > [13] GenomeInfoDb_1.4.3 IRanges_2.2.9 S4Vectors_0.6.6 BiocGenerics_0.14.0 > > loaded via a namespace (and not attached): > [1] genefilter_1.50.0 reshape2_1.4.1 splines_3.2.2 lattice_0.20-33 > [5] colorspace_1.2-6 base64enc_0.1-3 Category_2.34.2 XML_3.98-1.3 > [9] RBGL_1.44.0 survival_2.38-3 GOstats_2.34.0 RColorBrewer_1.1-2 > [13] lambda.r_1.1.7 plyr_1.8.3 stringr_1.0.0 zlibbioc_1.14.0 > [17] munsell_0.4.2 gtable_0.1.2 futile.logger_1.4.1 hwriter_1.3.2 > [21] latticeExtra_0.6-26 Biobase_2.28.0 AnnotationDbi_1.30.1 GSEABase_1.30.2 > [25] proto_0.3-10 Rcpp_0.12.1 xtable_1.7-4 edgeR_3.10.5 > [29] scales_0.3.0 checkmate_1.6.3 limma_3.24.15 graph_1.46.0 > [33] annotate_1.46.1 sendmailR_1.2-1 brew_1.0-6 BatchJobs_1.6 > [37] fail_1.3 rjson_0.2.15 ggplot2_1.0.1 digest_0.6.8 > [41] stringi_1.0-1 BBmisc_1.9 grid_3.2.2 tools_3.2.2 > [45] bitops_1.0-6 magrittr_1.5 RCurl_1.95-4.7 futile.options_1.0.0 > [49] GO.db_3.1.2 MASS_7.3-44 pheatmap_1.0.7 Matrix_1.2-2 > [53] AnnotationForge_1.10.1 > > > > ------------------------------ > > Post tags: granges, views, annotation, iranges > > You may reply via email or visit > C: Extract three types of intergenic regions >
ADD REPLY
0
Entering edit mode

Hi Thomas, As per your advice i successfully installed the 'systemPipeR'.  I am triying to find intergenic regions from .gff file from this. link http://downloads.yeastgenome.org/curation/chromosomal_feature/saccharomyces_cerevisiae.gff

When i tried to get make txdb databse from gff file , txdb is not created . Commands and sessoninfo are given below 

Can u suggest me the process to get intergenic regions of Saccharomyces cerevisiae

 

>gffFile="/home/vinod/new_yeast/yeast_classfication/saccharomyces_cerevisiae3.gff"
> txdb <- makeTxDbFromGFF(file=gffFile, format="gff3", organism="Saccharomyces cerevisiae")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .make_splicings(exons, cds, stop_codons) : 
  some CDS cannot be mapped to an exon
In addition: Warning message:
In .extract_exons_from_GRanges(cds_IDX, gr, ID, Name, Parent, feature = "cds",  :
  141 orphan CDSs were dropped
> feat <- genFeatures(txdb, featuretype="all", reduce_ranges=FALSE, upstream=1000, downstream=0)
Error in genFeatures(txdb, featuretype = "all", reduce_ranges = FALSE,  : 
  object 'txdb' not found

 

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8        LC_COLLATE=en_IN.UTF-8    
 [5] LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8    LC_PAPER=en_IN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] systemPipeR_1.4.2                           RSQLite_1.0.0                              
 [3] DBI_0.3.1                                   ShortRead_1.28.0                           
 [5] GenomicAlignments_1.6.1                     SummarizedExperiment_1.0.0                 
 [7] BiocParallel_1.4.0                          Rsamtools_1.22.0                           
 [9] BSgenome.Scerevisiae.UCSC.sacCer2_1.4.0     BSgenome_1.38.0                            
[11] rtracklayer_1.30.1                          Biostrings_2.38.0                          
[13] XVector_0.10.0                              TxDb.Scerevisiae.UCSC.sacCer2.sgdGene_3.2.2
[15] GenomicFeatures_1.22.0                      AnnotationDbi_1.32.0                       
[17] Biobase_2.30.0                              GenomicRanges_1.22.0                       
[19] GenomeInfoDb_1.6.0                          IRanges_2.4.1                              
[21] S4Vectors_0.8.0                             BiocGenerics_0.16.0                        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1            lattice_0.20-33        GO.db_3.2.2            digest_0.6.8           plyr_1.8.3            
 [6] futile.options_1.0.0   BatchJobs_1.6          ggplot2_1.0.1          zlibbioc_1.16.0        annotate_1.48.0       
[11] Matrix_1.2-2           checkmate_1.6.3        proto_0.3-10           GOstats_2.36.0         splines_3.2.2         
[16] stringr_1.0.0          pheatmap_1.0.7         RCurl_1.95-4.7         biomaRt_2.26.0         munsell_0.4.2         
[21] sendmailR_1.2-1        base64enc_0.1-3        BBmisc_1.9             fail_1.3               edgeR_3.12.0          
[26] XML_3.98-1.3           AnnotationForge_1.12.0 MASS_7.3-44            bitops_1.0-6           grid_3.2.2            
[31] RBGL_1.46.0            xtable_1.7-4           GSEABase_1.32.0        gtable_0.1.2           magrittr_1.5          
[36] scales_0.3.0           graph_1.48.0           stringi_1.0-1          hwriter_1.3.2          reshape2_1.4.1        
[41] genefilter_1.52.0      limma_3.26.0           latticeExtra_0.6-26    futile.logger_1.4.1    brew_1.0-6            
[46] rjson_0.2.15           lambda.r_1.1.7         RColorBrewer_1.1-2     tools_3.2.2            Category_2.36.0       
[51] survival_2.38-3        colorspace_1.2-6      
 

>

ADD REPLY
1
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 8 months ago
United States

makeTxDbFromGFF() from GenomicFeatures fails to produce a TxDb. Something in your GFF is not meeting the expected format. This is usually fixable by debugging the GFF. However, would you mind using Biomart as source of your annotations (GFF) instead? If this is fine then please try the following which works just fine:

> library(GenomicFeatures); library("biomaRt"); library(systemPipeR)
> txdb <- makeTxDbFromBiomart(biomart = "ensembl", dataset = "scerevisiae_gene_ensembl")
> myfeatures <- c("tx_type", "promoter", "intron", "exon", "cds", "intergenic")
> feat <- genFeatures(txdb, featuretype=myfeatures, reduce_ranges=FALSE, upstream=1000, downstream=0)

Created feature ranges: protein_coding, ncRNA, tRNA, snoRNA, pseudogene, snRNA, rRNA
Created feature ranges: promoter
Created feature ranges: intron
Created feature ranges: exon
Created feature ranges: cds
Created feature ranges: intergenic 

Now the intergenic ranges can be extracted with feat$intergenic. Note: using feature="all" will give you an error for fiveUTR/threeUTR since those return empty objects. I will change this to a warning in the next update of systemPipeR.

Thomas 

 

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi,

If your intergenic regions are annotated in your GFF file, then just import the file with the import() function from the rtracklayer package. This will return you a GRanges object with various metadata columns. One of them will be named type and it will tell you the type of feature for each range in the GRanges object. Your 3 types of intergenic regions should show up there.

H.

ADD COMMENT
0
Entering edit mode

Hi Herve,  it had not shown intergenic features. 

ADD REPLY
0
Entering edit mode

If you used http://downloads.yeastgenome.org/curation/chromosomal_feature/saccharomyces_cerevisiae.gff

it doesn't seem to contain information about the intergenic regions, unfortunately. That's why the GRanges object you got with rtracklayer::import() doesn't contain these regions either. The next thing to try is what Thomas suggested. I assume it worked for you because you accepted his answer. If you're still struggling with this, you would need to provide some details about what you've done so far and what problems you ran into.

Cheers,

H.

ADD REPLY
0
Entering edit mode

Hi Herve ,

Trick by Thomas worked for me. Thanks for your support

ADD REPLY

Login before adding your answer.

Traffic: 935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6