I want to analyse RNA-Seq data for non-model organism using DESeq pacakge, organism is : Rhodnius proxilus http://metazoa.ensembl.org/Rhodnius_prolixus/Info/Annotation/#genebuild , I want to know how shall I import this.
I want to analyse RNA-Seq data for non-model organism using DESeq pacakge, organism is : Rhodnius proxilus http://metazoa.ensembl.org/Rhodnius_prolixus/Info/Annotation/#genebuild , I want to know how shall I import this.
Hi,
I see "BS genome" in your question so maybe you want to know how to forge a BSgenome data package for the Rhodnius prolixus assembly? Please refer to the BSgenomeForge vignette in the BSgenome package for how to do this. You'll need access to the DNA sequences which are here:
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta/rhodnius_prolixus/dna/
(I think file Rhodnius_prolixus.RproC1.dna.toplevel.fa.gz is what you want.)
I don't know if the DESeq package supports this but some Bioconductor packages also accept a FaFile object as a replacement for the BSgenome object. If DESeq supports this, then you don't need to go thru the hassle of forging a BSgenome package, you can just use a FaFile object instead. To make a FaFile object, just do:
library(Rsamtools) indexFa("Rhodnius_prolixus.RproC1.dna.toplevel.fa") genome <- FaFile("Rhodnius_prolixus.RproC1.dna.toplevel.fa")
I strongly recommend that you uncompress Rhodnius_prolixus.RproC1.dna.toplevel.fa.gz first (e.g. with Unix command gunzip
). IIRC we've seen some problems on some platforms in the past with FaFile objects pointing to compressed FASTA files.
Also, ideally, it would be great if packages like DESeq that take a genome as part of their input could also accept a genome represented as a TwoBitFile object. These objects are more efficient than FaFile objects e.g. they use less space on disk and allow much more efficient random sequence access. To create such an object, do something like:
library(Biostrings) dna <- readDNAStringSet("Rhodnius_prolixus.RproC1.dna.toplevel.fa") library(rtracklayer) export(dna, "Rhodnius_prolixus.RproC1.dna.toplevel.2bit") genome <- TwoBitFile("Rhodnius_prolixus.RproC1.dna.toplevel.2bit")
Here is an example of random sequence access:
seqinfo(genome) # Seqinfo object with 27870 sequences from an unspecified genome: # seqnames seqlengths isCircular genome # GL563091 12301132 <NA> <NA> # GL563092 8971143 <NA> <NA> # GL563191 5632155 <NA> <NA> # GL563178 4752152 <NA> <NA> # GL563176 4581433 <NA> <NA> # ... ... ... ... # GL567782 713 <NA> <NA> # GL567975 703 <NA> <NA> # GL566950 571 <NA> <NA> # GL569160 551 <NA> <NA> # GL569309 263 <NA> <NA> getSeq(genome, GRanges("GL563092:125-200")) # A DNAStringSet instance of length 1 # width seq # [1] 76 TGTACTTTACTACATATTGTT...TGTAATCCGACCGCTAAAGG
Note that in most BSgenome packages the sequences are also stored in the 2bit format.
Hope this helps,
H.
Hi
Thanks for this ,but I have a trouble calling this twobitFa file to DESeq2 package. Can you tell me is there any way to do it??
This is a little bit vague. Please tell us what you've done and what error you got. I'm no DESeq/DESeq2 expert but I guess that if these packages don't accept a TwoBitFile object as input then that probably means that you need to use a BSgenome object instead. Please refer to the man page of the function you're trying to call (you're not saying which one it is) in order to get more information about what the input should be.
H.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you want to import? You could use Biomart (or BiomaRt package) to retrieve data from Ensembl.