Hej Vicky!
I get it now. I?ve had a look at the file and that?s how it looks
like:
2L . miRNA_primary_transcript 243035 243141 .
- . ID=MI0005821;Alias=MI0005821;Name=dme-mir-965
2L . miRNA 243055 243076 . - .
ID=MIMAT0005480;Alias=MIMAT0005480;Name=dme-
miR-965-3p;Derives_from=MI0005821
2L . miRNA 243096 243118 . - .
ID=MIMAT0020861;Alias=MIMAT0020861;Name=dme-
miR-965-5p;Derives_from=MI0005821
What exactly are you interested in, the miRNA only or both miRNA and
pri-miRNA? If you want both, it?s going to be more tricky as you will
have multiple counting indeed - the pri-miRNA containing both miRNA.
The gff would anyway need reformatting. It should look like:
2L . exon 243055 243076 . - .
ID=MIMAT0005480:1;Alias=MIMAT0005480;Name=dme-
miR-965-3p;Parent=MI0005821
2L . exon 243096 243118 . - .
ID=MIMAT0020861:1;Alias=MIMAT0020861;Name=dme-
miR-965-5p;Parent=MI0005821
This can easily be done using the genomeIntervals package:
library(genomeIntervals)
gff <- readGff3(?dmel.gff3?)
## select only miRNA
gff <- gff[gff$type == ?miRNA? ]
## change the third column
gff$type=?exon?
## add a ?:exon.number? to the ID. exon.number is always 1 here as we
do not expect several exon per miRNA
annotation(gff)$gffAttributes <-
sub(?;Alias?,?:1;Alias?,annotation(gff)$gffAttributes)
## replace Derives_from with Parent
annotation(gff)$gffAttributes <-
sub(?Derives_from?,?Parent?,annotation(gff)$gffAttributes)
## write it out
writeGff3(gff,"dmel-miRNA-for-easyRNASeq.gff3?)
I haven?t tested that code, so there may be typos. But if there are it
should not be far off ;-)
Then you could call easyRNASeq with count=exons.
Before that, to test your modified gff file compatibility with
easyRNASeq you could do:
easyRNASeq:::.getGffRange(filename="dmel-miRNA-for-easyRNASeq.gff3?)
and that should return a RangedData object.
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Nathaniel Street Lab
Department of Plant Physiology
Ume? Plant Science Center
Tel: +46 90 786 7989
Email: nicolas.delhomme at plantphys.umu.se
SLU - Ume? universitet
Ume? S-901 87 Sweden
---------------------------------------------------------------
On 20 Nov 2013, at 01:00, Victoria Church <victoriaachurch at="" gmail.com=""> wrote:
> Hi,
>
> Thanks for your response. For the input, I have bam files of short
reads aligned to the drosophila melanogaster genome (dm3) as well as
bam files of short reads aligned to miRbase. Ideally, I would be able
to use easyRNASeq to make Count Data Sets for the bamfiles aligned to
miRbase with the miRbase annotation file for drosophila, (dme.gff3,
found here:
ftp://mirbase.smith.man.ac.uk/pub/mirbase/CURRENT/genomes). I was able
to use it fine for the files aligned to the genome, but I have not
quite figured out if it will be possible to reformat the gff file of
miRbase to be recognized by easyRNASeq.
>
> In the case of "transcripts" if you define mono-exonic transcript,
does that mean that you will only count each read once?
>
> Thank you,
> Vicky
>
>
> On Mon, Nov 18, 2013 at 6:49 AM, Nicolas Delhomme <delhomme at="" embl.de=""> wrote:
> Hej Vicky!
>
> The very short answer is that it depends on the content of your
annotation. I can?t really tell you more without knowing more about
the kind of data you want to provide as input to easyRNASeq, both for
the read alignments and as annotation. In my view of things using
?features? would be the more appropriate, but that?s just a name. If
you define mono-exonic transcript and use ?transcripts?, the results
would be the same.
>
> Now, with regards to miRNA, I assume that easyRNASeq could get you a
count table, but the seldom time I?ve worked with miRNA, I directly
got an occurrence count table as output, e.g. form the UEA workbench.
If you would detail ab it more your data processing, I could tell you
more accurately if there would be any caveats in using easyRNASeq for
that.
>
> Finally, I?ve never - so far - tried a differential analysis
approach with miRNA data, and I would think that DESeq, DESeq2, edgeR,
etc.. are valid tools for that but maybe others from the list can
chime in?
>
> Cheers,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On 18 Nov 2013, at 05:12, Vicky Chu [guest] <guest at="" bioconductor.org=""> wrote:
>
> >
> > Hi,
> >
> > I am looking to use DESeq to analyze my 18-29 nt small RNA
libraries. Is it appropriate to prepare a countDataSet that counts
"transcripts" , or possibly "features" in this case? What do you
recommend?
> >
> > Thank you!
> >
> > -- output of sessionInfo():
> >
> > sessionInfo()
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-apple-darwin10.8.0 (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] parallel stats graphics grDevices utils datasets
methods base
> >
> > other attached packages:
> > [1] easyRNASeq_1.8.1 ShortRead_1.20.0 Rsamtools_1.14.1
GenomicRanges_1.14.3
> > [5] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1
Biostrings_2.30.1
> > [9] XVector_0.2.0 IRanges_1.20.5 edgeR_3.4.0
limma_3.18.3
> > [13] biomaRt_2.18.0 Biobase_2.22.0
genomeIntervals_1.18.0 BiocGenerics_0.8.0
> > [17] intervals_0.14.0
> >
> > loaded via a namespace (and not attached):
> > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6
DBI_0.2-7
> > [5] genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2
hwriter_1.3
> > [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5
RCurl_1.95-4.1
> > [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
survival_2.37-4
> > [17] tools_3.0.2 XML_3.95-0.2 xtable_1.7-1
zlibbioc_1.8.0
> >>
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
>
>