Hi everyone,
so i am trying to extract the 1000bp upstream, for every gene in solanum pennellii, i know i would need to build the txdb object and the bsgenome for this, firstly i downloaded the fasta format genome and the gff file from ncbi genome database
http://www.ncbi.nlm.nih.gov/genome/?term=solanum+pennellii
after that , i used this perl script to split the genome of s.pennellii into chromosomes and build the bsgenome from these 12 chromosomes.
#!/usr/bin/perl $f = $ARGV[0]; #get the file name open (INFILE, "<$f") or die "Can't open: $f $!"; while (<INFILE>) { $line = $_; chomp $line; if ($line =~ /\>/) { #if has fasta > close OUTFILE; $new_file = substr($line,1); $new_file .= ".fa"; open (OUTFILE, ">$new_file") or die "Can't open: $new_file $!"; } print OUTFILE "$line\n"; } close OUTFILE;
__________________________________________
then i created the seed file and forge the bsgenome package as the instructions. there are actually no errors when checking the built package.
Package:BSgenome.Slycopersicum.ensembl.Heinz1706 Title:Full genome sequences for Solanum lycopersicum Description:Full genome sequences for Solanmum lycopersicum provided by ensembl Version:1.0 organism:Solanum lycopersicum common_name:Tomato provider:ensembl provider_version:Heinz1706 release_date:Jan.2016 release_name:S.lyco Genome Sequencing source_url:ftp://ftp.ensemblgenomes.org/pub/plants/release-30/fasta/solanum_lycopersicum/dna/ organism_biocview: Solanum_lycopersicum BSgenomeObjname: Slycopersicum seqnames: paste("chr", c(1:12), sep="") circ_seqs: NULL mseqnames: NULL SrcDataFiles:ftp://ftp.ensemblgenomes.org/pub/plants/release-30/fasta/solanum_lycopersicum/dna/ PkgExamples: genome$GL896898 # same as genome[["GL896898"]] seqs_srcdir: ~/Desktop/S.lyco
_________________________
Unfortunately, when i typed the code, the test return something weird , also the seqinfo returns an error too
library("BSgenome.Spennellii.ncbi.SPENNV200") test<-library("BSgenome.Spennellii.ncbi.SPENNV200") > test [1] "BSgenome.Athaliana.TAIR.TAIR9" [2] "TxDb.Athaliana.BioMart.plantsmart28" [3] "GenomicFeatures" [4] "AnnotationDbi" [5] "Biobase" [6] "BSgenome.Spennellii.ncbi.SPENNV200" [7] "BSgenome" [8] "rtracklayer" [9] "Biostrings" [10] "XVector" [11] "GenomicRanges" [12] "GenomeInfoDb" [13] "IRanges" [14] "S4Vectors" [15] "stats4" [16] "BiocGenerics" [17] "parallel" [18] "stats" [19] "graphics" [20] "grDevices" [21] "utils" [22] "datasets" [23] "methods" [24] "base" > seqinfo(test) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘seqinfo’ for signature ‘"character"’
_______________________
for the txdb object , the code i used was as below and it looks wrong to me as well.
gtffile <- file.path("~/Desktop/s.pen/gff/GCF_001406875.1_SPENNV200_genomic.gff") makeTxDbFromGFF(gtffile) txdb <- makeTxDbFromGFF(gtffile) seqinfo(txdb)
Seqinfo object with 12 sequences from an unspecified genome; no seqlengths: seqnames seqlengths isCircular genome NC_028637.1 <NA> <NA> <NA> NC_028638.1 <NA> <NA> <NA> NC_028639.1 <NA> <NA> <NA> NC_028640.1 <NA> <NA> <NA> NC_028641.1 <NA> <NA> <NA> ... ... ... ... NC_028644.1 <NA> <NA> <NA> NC_028645.1 <NA> <NA> <NA> NC_028646.1 <NA> <NA> <NA> NC_028647.1 <NA> <NA> <NA> NC_028648.1 <NA> <NA> <NA>
can anybody tell me may be i have done something wrong?? thank you very much.
> sessionInfo()
R version 3.2.3 (2015-12-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BSgenome.Athaliana.TAIR.TAIR9_1.3.1000 [2] TxDb.Athaliana.BioMart.plantsmart28_3.2.2 [3] GenomicFeatures_1.22.12 [4] AnnotationDbi_1.32.3 [5] Biobase_2.30.0 [6] BSgenome.Spennellii.ncbi.SPENNV200_1.0 [7] BSgenome_1.38.0 [8] rtracklayer_1.30.1 [9] Biostrings_2.38.3 [10] XVector_0.10.0 [11] GenomicRanges_1.22.4 [12] GenomeInfoDb_1.6.3 [13] IRanges_2.4.6 [14] S4Vectors_0.8.11 [15] BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] zlibbioc_1.16.0 GenomicAlignments_1.6.3 [3] BiocParallel_1.4.3 tools_3.2.3 [5] SummarizedExperiment_1.0.2 DBI_0.3.1 [7] lambda.r_1.1.7 futile.logger_1.4.1 [9] futile.options_1.0.0 bitops_1.0-6 [11] biomaRt_2.26.1 RCurl_1.95-4.7 [13] RSQLite_1.0.0 Rsamtools_1.22.0 [15] XML_3.98-1.3
Instead of saying
test <- library("BSgenome.Spennellii.ncbi.SPENNV200")
, say