Running gmap tool via gmapR package?
1
0
Entering edit mode
@thomas-sandmann-6817
Last seen 15 months ago
USA

Hi Michael,

Thanks a lot for making the gsnap aligner (and associated tools) available through the gmapR package. The vignette mentions that "GMAP integration is scheduled for the near future" as well. 

I noticed that there is a gmap-command.R file in the package as well, but it still seems to be a stub since 2012 (last change in the repository). Do you still plan to enable use of the gmap command through the package?

Again, thanks for making these (and many other) tools available to the community,

Thomas

 

gmapR • 2.2k views
ADD COMMENT
0
Entering edit mode

Hi Michael,

thanks a lot for responding to my question.

I would like to map cDNA sequences to a genome, e.g. map an EST sequence to the reference genome of the same or a very highly related species. As described in the original gmap paper, I would be interested in very close matches, e.g. near-identity between a cDNA sequence and its corresponding genomic exons, to learn about the intron-exon structure of the locus.

For example, I would like to start with one (or more) query sequences (fasta files, DNAStrings, DNAStringSeq, whatever makes most sense) and a GmapGenome.

Does this description help?

Thanks, Thomas

ADD REPLY
0
Entering edit mode

I will put it on the list. Perhaps something in devel by the end of the week. Have to fix R first.

ADD REPLY
0
Entering edit mode

Update: have been working on this, but it's a fairly significant feature addition and will take a couple more days before it is ready to try.

ADD REPLY
0
Entering edit mode

Thanks a lot for spending some of your valuable time to implement this feature. Very much appreciated. Let me know if / when I can help.

ADD REPLY
0
Entering edit mode

I edited the answer; there is something that might work.

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

Here is what you can do with gmapR 1.13.4 (after one has a genome and a FASTA file). 

gmapParam <- GmapParam(genome=gmapGenome,
                       unique_only=FALSE,
                       suboptimal_score=2L, 
                       npaths=10L,
                       splicing=FALSE,
                       min_intronlength=20L)
output <- gmap("H1993.fasta", params=gmapParam)
gr <- import(output)

I am going to be honest: I did not do much testing of this. Perhaps you could help me there ;)

ADD COMMENT
0
Entering edit mode

Hi Michael,

thanks a ton for for implementing gmapR so quickly. I ran your example code (using the Drosophila dm3 genome and an example fasta file) a few days ago and got nice results. Yet, when I tried again today - to test it more thoroughly - with a fresh download of gmapR 1.13.6, I ran into an error. (I have trouble accessing the BioC SVN repository right now, so I can't check any earlier versions... Dan is helping me get back on track.)

Here's a (hopefully) reproducible example:

library(BSgenome.Dmelanogaster.UCSC.dm3)
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
library(gmapR)

# load Drosophila melanogaster genome and transcriptome annotations
genome <- BSgenome.Dmelanogaster.UCSC.dm3
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
seqlevels(txdb) <- "chr2L"
tx_seqs <- extractTranscriptSeqs(genome, txdb)

# create fasta file with an example transcript sequence
transcriptId <- "FBtr0091487"  # a transcript from the pnuts gene
stopifnot(transcriptId %in% names(tx_seqs))
temp.file <- tempfile(fileext = ".fasta")
writeXStringSet(tx_seqs[transcriptId], temp.file)

# alignment
params <- GmapParam(genome=GmapGenome(genome),
                    unique_only=TRUE,
                    suboptimal_score=2L, 
                    npaths=10L,
                    splicing=TRUE,
                    min_intronlength=20L)
output <- gmap(temp.file, params = params)

Error in params$format : $ operator not defined for this S4 class

Here's the result of calling sessionInfo(). (I currently don't have a separate installation of R devel available to me, so there is a mix of release and development versions of packages here.)

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] gmapR_1.13.6                              TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
 [3] GenomicFeatures_1.22.6                    AnnotationDbi_1.32.0                     
 [5] Biobase_2.30.0                            BSgenome.Dmelanogaster.UCSC.dm3_1.4.0    
 [7] BSgenome_1.38.0                           rtracklayer_1.31.2                       
 [9] Biostrings_2.38.2                         XVector_0.10.0                           
[11] GenomicRanges_1.22.1                      GenomeInfoDb_1.6.1                       
[13] IRanges_2.4.4                             S4Vectors_0.8.3                          
[15] BiocGenerics_0.16.1                       BiocInstaller_1.20.1                     
[17] devtools_1.9.1                           

loaded via a namespace (and not attached):
 [1] zlibbioc_1.16.0            GenomicAlignments_1.6.1    BiocParallel_1.4.0        
 [4] tools_3.2.2                SummarizedExperiment_1.0.1 DBI_0.3.1                 
 [7] lambda.r_1.1.7             futile.logger_1.4.1        digest_0.6.8              
[10] futile.options_1.0.0       bitops_1.0-6               biomaRt_2.26.1            
[13] RCurl_1.95-4.7             RSQLite_1.0.0              memoise_0.2.1             
[16] Rsamtools_1.22.0           XML_3.98-1.3               VariantAnnotation_1.16.3  

ADD REPLY
0
Entering edit mode

Thanks for the testing and reproducible example. Fixed in 1.13.7.

ADD REPLY
0
Entering edit mode

Hi Michael,

thanks again for fixing the small bug so quickly. I have aligned a number of Drosophila transcripts by now and e.g. visualized the results with the Gviz package. Things work great and I am very grateful that you make it so easy to use gmap through R.

I noticed that the import method for the GmapOutput object wasn't able to parse the gff3 file I generated. Specifying the gff3 format explicitly helped e.g. in the example pasted below.

Now, I will need some more reading on gmap's command line options to understand the different parameters better (and how to specify them via the GmapParam function).

library(gmapR)
library(BSgenome.Dmelanogaster.UCSC.dm3)
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
library(Gviz)

# load genome and transcriptome annotations
genome <- BSgenome.Dmelanogaster.UCSC.dm3
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
tx_seqs <- extractTranscriptSeqs(genome, txdb)

# create fasta file with annotated transcripts sequences for the Dmef2 gene
geneId <- "FBgn0011656"  #  Dmef2 gene
transcriptIds <- transcriptsBy(txdb, by = "gene")[[geneId]]$tx_name  # all Dmef2 transcripts
temp.file <- tempfile(fileext = ".fasta")
writeXStringSet(tx_seqs[transcriptIds], temp.file)

# alignment
params <- GmapParam(genome=GmapGenome(genome),
                    unique_only=TRUE,
                    suboptimal_score=2L, 
                    npaths=10L,
                    splicing=TRUE,
                    min_intronlength=20L, 
                    format = "gff3_gene")
output <- gmap(temp.file, params = params)
gr <- import(output)  # error
gr <- import.gff3(path(output), format = "gff3")  # works

# visualization with GViz
gff.file <- sub("fasta$", "gff3", basename(temp.file))
stopifnot(file.exists(gff.file))

gaTrack <- GenomeAxisTrack()
grTrack <- GeneRegionTrack(txdb, name = "Flybase", showId = TRUE, 
                           fill = "#FDAE6B", col="#7F2704")
gmTrack <- GeneRegionTrack(gff.file, name  = "gmap", showId = TRUE, 
                           fill = "#6BAED6", col="#08306B")
plotTracks(list(gaTrack, grTrack, gmTrack), chromosome = chromosome(gmTrack),
           from = min(start(gr)-1000), to = max(end(gr)+1000))

SessionInfo() output:

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices datasets  utils    
 [9] methods   base     

other attached packages:
 [1] Gviz_1.14.0                               TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2
 [3] GenomicFeatures_1.22.6                    AnnotationDbi_1.32.0                     
 [5] Biobase_2.30.0                            BSgenome.Dmelanogaster.UCSC.dm3_1.4.0    
 [7] BSgenome_1.38.0                           rtracklayer_1.31.2                       
 [9] Biostrings_2.38.2                         XVector_0.10.0                           
[11] gmapR_1.13.7                              GenomicRanges_1.22.1                     
[13] GenomeInfoDb_1.6.1                        IRanges_2.4.4                            
[15] S4Vectors_0.8.3                           BiocGenerics_0.16.1                      
[17] BiocInstaller_1.20.1                      devtools_1.9.1                           

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.0.1 VariantAnnotation_1.16.3   reshape2_1.4.1            
 [4] splines_3.2.2              lattice_0.20-33            colorspace_1.2-6          
 [7] XML_3.98-1.3               survival_2.38-3            foreign_0.8-66            
[10] DBI_0.3.1                  BiocParallel_1.4.0         RColorBrewer_1.1-2        
[13] lambda.r_1.1.7             matrixStats_0.15.0         plyr_1.8.3                
[16] stringr_1.0.0              zlibbioc_1.16.0            munsell_0.4.2             
[19] gtable_0.1.2               futile.logger_1.4.1        memoise_0.2.1             
[22] latticeExtra_0.6-26        biomaRt_2.26.1             proto_0.3-10              
[25] Rcpp_0.12.2                acepack_1.3-3.3            scales_0.3.0              
[28] Hmisc_3.17-0               gridExtra_2.0.0            Rsamtools_1.22.0          
[31] ggplot2_1.0.1              digest_0.6.8               stringi_1.0-1             
[34] biovizBase_1.18.0          tools_3.2.2                bitops_1.0-6              
[37] magrittr_1.5               RCurl_1.95-4.7             RSQLite_1.0.0             
[40] dichromat_2.0-0            Formula_1.2-1              cluster_2.0.3             
[43] futile.options_1.0.0       MASS_7.3-45                rpart_4.1-10              
[46] GenomicAlignments_1.6.1    nnet_7.3-11
  

ADD REPLY
0
Entering edit mode

Thanks, should be fixed in 1.13.8

ADD REPLY

Login before adding your answer.

Traffic: 945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6