BS genome non-model organism
2
0
Entering edit mode
minie • 0
@minie-11306
Last seen 7.1 years ago

I want to analyse RNA-Seq data for non-model organism using DESeq pacakge, organism is : Rhodnius proxilus http://metazoa.ensembl.org/Rhodnius_prolixus/Info/Annotation/#genebuild , I want to know  how shall I import this.

bsgenome • 1.7k views
ADD COMMENT
0
Entering edit mode

What do you want to import? You could use Biomart (or BiomaRt package) to retrieve data from Ensembl.

ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 1 hour ago
Seattle, WA, United States

Hi,

I see "BS genome" in your question so maybe you want to know how to forge a BSgenome data package for the Rhodnius prolixus assembly? Please refer to the BSgenomeForge vignette in the BSgenome package for how to do this. You'll need access to the DNA sequences which are here:

ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta/rhodnius_prolixus/dna/

(I think file Rhodnius_prolixus.RproC1.dna.toplevel.fa.gz is what you want.)

I don't know if the DESeq package supports this but some Bioconductor packages also accept a FaFile object as a replacement for the BSgenome object. If DESeq supports this, then you don't need to go thru the hassle of forging a BSgenome package, you can just use a FaFile object instead. To make a FaFile object, just do:

library(Rsamtools)
indexFa("Rhodnius_prolixus.RproC1.dna.toplevel.fa")
genome <- FaFile("Rhodnius_prolixus.RproC1.dna.toplevel.fa")

I strongly recommend that you uncompress Rhodnius_prolixus.RproC1.dna.toplevel.fa.gz first (e.g. with Unix command gunzip). IIRC we've seen some problems on some platforms in the past with FaFile objects pointing to compressed FASTA files.

Also, ideally, it would be great if packages like DESeq that take a genome as part of their input could also accept a genome represented as a TwoBitFile object. These objects are more efficient than FaFile objects e.g. they use less space on disk and allow much more efficient random sequence access. To create such an object, do something like:

library(Biostrings)
dna <- readDNAStringSet("Rhodnius_prolixus.RproC1.dna.toplevel.fa")
library(rtracklayer)
export(dna, "Rhodnius_prolixus.RproC1.dna.toplevel.2bit")
genome <- TwoBitFile("Rhodnius_prolixus.RproC1.dna.toplevel.2bit")

Here is an example of random sequence access:

seqinfo(genome)
# Seqinfo object with 27870 sequences from an unspecified genome:
#   seqnames seqlengths isCircular genome
#   GL563091   12301132       <NA>   <NA>
#   GL563092    8971143       <NA>   <NA>
#   GL563191    5632155       <NA>   <NA>
#   GL563178    4752152       <NA>   <NA>
#   GL563176    4581433       <NA>   <NA>
#   ...             ...        ...    ...
#   GL567782        713       <NA>   <NA>
#   GL567975        703       <NA>   <NA>
#   GL566950        571       <NA>   <NA>
#   GL569160        551       <NA>   <NA>
#   GL569309        263       <NA>   <NA>

getSeq(genome, GRanges("GL563092:125-200"))
#   A DNAStringSet instance of length 1
#     width seq
# [1]    76 TGTACTTTACTACATATTGTT...TGTAATCCGACCGCTAAAGG

Note that in most BSgenome packages the sequences are also stored in the 2bit format.

Hope this helps,

H.

ADD COMMENT
0
Entering edit mode
minie • 0
@minie-11306
Last seen 7.1 years ago

Hi

Thanks for this ,but I have a trouble calling this twobitFa file to DESeq2 package. Can you tell me is there any way to do it??

 

 

 

ADD COMMENT
0
Entering edit mode

This is a little bit vague. Please tell us what you've done and what error you got. I'm no DESeq/DESeq2 expert but I guess that if these packages don't accept a TwoBitFile object as input then that probably means that you need to use a BSgenome object instead. Please refer to the man page of the function you're trying to call (you're not saying which one it is) in order to get more information about what the input should be.

H.

ADD REPLY

Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6