forge masked genome
0
0
Entering edit mode
nadia.goue • 0
@nadiagoue-11941
Last seen 8.1 years ago

Hello,

I am working with Drosophila melanogaster dm6 and I wish to use a package that requires both BSgenome.Dmelanogaster.UCSC.dm6 and BSgenome.Dmelanogaster.UCSC.dm6.masked. I could download the former one but I need to forge the second.

I am facing an error message while installing the genome masked package at the R CMD CHECK step. I was thinking that maybe the error could be related to many contigs sequences present in the twoBit file. However, this doesn't seem to be a problem while forging the genome with bare sequences. What did I miss(understand) in the manual?

My seed file:

Package: BSgenome.Dmelanogaster.UCSC.dm6.masked
Title: Full masked genome sequences for Drosophila melanogaster (UCSC version dm6)
Description: Full genome sequences for Drosophila melanogaster (Fly) as provided by UCSC (dm6, Aug. 2014) and stored in Biostrings objects.The sequences are the same as in BSgenome.Dmelanogaster.UCSC.dm6, except that each of them has the 4 following masks on top: (1) the mask of assembly gaps (AGAPS mask), (2) the mask of intra-contig ambiguities (AMB mask), (3) the mask of repeats from RepeatMasker (RM mask), and (4) the mask of repeats from Tandem Repeats Finder (TRF mask). Only the AGAPS and AMB masks are "active" by default.
Version: 1.4.2
RefPkgname: BSgenome.Dmelanogaster.UCSC.dm6
source_url: http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/
organism_biocview: Drosophila_melanogaster
nmask_per_seq: 3
SrcDataFiles: AGAPS masks: dm6.agp.gz from http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/
    RM masks: dm6.fa.out.gz from http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/
    TRF masks: dm6.trf.bed.gz from http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/
#PkgExamples: genome$chr2L  # a MaskedDNAString object!
## To get rid of the masks altogether:
#unmasked(genome$chr2L)  # same as BSgenome.Dmelanogaster.UCSC.dm6$chr2L
masks_srcdir: .
AGAPSfiles_type: agp
AGAPSfiles_name: dm6.agp
RMfiles_name: dm6.fa.out
TRFfiles_name: dm6.trf.bed

I have to precise that I used the agp file type despite the fact that this file is coming from the UCSC site.

forgeMaskedBSgenomeDataPkg("BSgenome.Dmelanogaster.UCSC.dm6.masked-seed")

Creating package in ./BSgenome.Dmelanogaster.UCSC.dm6.masked
Saving 'chr2L.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chr2L.masks.rda' ... DONE
Saving 'chr2R.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chr2R.masks.rda' ... DONE
Saving 'chr3L.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chr3L.masks.rda' ... DONE
Saving 'chr3R.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chr3R.masks.rda' ... DONE
Saving 'chr4.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chr4.masks.rda' ... DONE
Saving 'chrM.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chrM.masks.rda' ... DONE
Saving 'chrX.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chrX.masks.rda' ... DONE
Saving 'chrY.masks' object to compressed data file './BSgenome.Dmelanogaster.UCSC.dm6.masked/inst/extdata/chrY.masks.rda' ... DONE
Warning messages:
1: In .newEmptyMask(seqname, mask.width, mask.name, mask.desc, mask.desc) :
  No assembly gaps found for sequence "chr2L" in this file. returning empty mask

same warnings for "chr2R", "chr3L",  "chr3R",  "chr4",  "chrM",  "chrX" and "chrY"

I don't know what to modify in the seed file to solve this warnings, which may have an effect on the following step:

R CMD build BSgenome.Dmelanogaster.UCSC.dm6.masked
R CMD check BSgenome.Dmelanogaster.UCSC.dm6.masked_1.4.2.tar.gz
return
* checking whether package ‘BSgenome.Dmelanogaster.UCSC.dm6.masked’ can be installed ... ERROR
Installation failed.

the 00install.out file contains:

Error : .onLoad failed in loadNamespace() for 'BSgenome.Dmelanogaster.UCSC.dm6.masked', details:
  call: validObject(.Object)
  error: invalid class “RdaCollection” object: files
 data/BSgenome.Dmelanogaster.UCSC.dm6.masked/extdata/chrX_DS483666v1_random.masks.rda', '/data/BSgenome$
ERROR: loading failed
execution stopped
* removing ‘/data/BSgenome.Dmelanogaster.UCSC.dm6.masked.Rcheck/BSgenome.Dmelanogaster.UCSC.dm6.masked’

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BSgenome.Dmelanogaster.UCSC.dm6_1.4.1 BSgenome_1.38.0                       rtracklayer_1.30.4                    GenomicRanges_1.22.4                 
 [5] GenomeInfoDb_1.6.3                    Biostrings_2.38.4                     XVector_0.10.0                        Category_2.36.0                      
 [9] GO.db_3.2.2                           RSQLite_1.0.0                         DBI_0.5-1                             Matrix_1.2-7.1                       
[13] AnnotationDbi_1.32.3                  IRanges_2.4.8                         S4Vectors_0.8.11                      Biobase_2.30.0                       
[17] BiocGenerics_0.16.1                  

loaded via a namespace (and not attached):
 [1] splines_3.3.2              Formula_1.2-1              assertthat_0.1             latticeExtra_0.6-28        RBGL_1.46.0                Rsamtools_1.22.0          
 [7] lattice_0.20-34            chron_2.3-47               digest_0.6.10              RColorBrewer_1.1-2         colorspace_1.3-1           htmltools_0.3.5           
[13] plyr_1.8.4                 psych_1.6.9                GSEABase_1.32.0            DESeq2_1.10.1              XML_3.98-1.1               genefilter_1.52.1         
[19] R.4Cker_0.0.0.9000         zlibbioc_1.16.0            xtable_1.8-2               scales_0.4.1               BiocParallel_1.4.3         htmlTable_1.7             
[25] tibble_1.2                 annotate_1.48.0            ggplot2_2.2.0              SummarizedExperiment_1.0.2 nnet_7.3-12                lazyeval_0.2.0            
[31] mnormt_1.5-5               survival_2.40-1            magrittr_1.5               MASS_7.3-45                RcppArmadillo_0.7.500.0.0  foreign_0.8-67            
[37] truncnorm_1.0-7            graph_1.48.0               tools_3.3.2                data.table_1.9.6           stringr_1.1.0              munsell_0.4.3             
[43] depmixS4_1.3-3             locfit_1.5-9.1             cluster_2.0.5              lambda.r_1.1.9             futile.logger_1.4.3        grid_3.3.2                
[49] RCurl_1.95-4.8             miscTools_0.6-20           Rsolnp_1.16                bitops_1.0-6               gtable_0.2.0               GenomicAlignments_1.6.3   
[55] gridExtra_2.2.1            knitr_1.15.1               Hmisc_4.0-0                futile.options_1.0.0       stringi_1.1.2              Rcpp_0.12.8               
[61] geneplotter_1.48.0         rpart_4.1-10               acepack_1.4.1

So I am short of ideas, any help would be very welcome.

Many thanks,

Nadia

bsgenome drosophila melanogaster bsgnome forge • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6