Question

easyRNASeq for miRNA read counts

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 10.6 years ago

Hi, I am looking to use DESeq to analyze my 18-29 nt small RNA libraries. Is it appropriate to prepare a countDataSet that counts "transcripts" , or possibly "features" in this case? What do you recommend? Thank you! -- output of sessionInfo(): sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] easyRNASeq_1.8.1 ShortRead_1.20.0 Rsamtools_1.14.1 GenomicRanges_1.14.3 [5] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biostrings_2.30.1 [9] XVector_0.2.0 IRanges_1.20.5 edgeR_3.4.0 limma_3.18.3 [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 BiocGenerics_0.8.0 [17] intervals_0.14.0 loaded via a namespace (and not attached): [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 DBI_0.2-7 [5] genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 [17] tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0 > -- Sent via the guest posting facility at bioconductor.org.

DESeq DESeq • 2.0k views

ADD COMMENT • link updated 11.4 years ago by Nicolas Delhomme ▴ 320 • written 11.4 years ago by Guest User ★ 13k

score 0 · Answer 1 · 2013-11-18

Hej Vicky! The very short answer is that it depends on the content of your annotation. I can?t really tell you more without knowing more about the kind of data you want to provide as input to easyRNASeq, both for the read alignments and as annotation. In my view of things using ?features? would be the more appropriate, but that?s just a name. If you define mono-exonic transcript and use ?transcripts?, the results would be the same. Now, with regards to miRNA, I assume that easyRNASeq could get you a count table, but the seldom time I?ve worked with miRNA, I directly got an occurrence count table as output, e.g. form the UEA workbench. If you would detail ab it more your data processing, I could tell you more accurately if there would be any caveats in using easyRNASeq for that. Finally, I?ve never - so far - tried a differential analysis approach with miRNA data, and I would think that DESeq, DESeq2, edgeR, etc.. are valid tools for that but maybe others from the list can chime in? Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 18 Nov 2013, at 05:12, Vicky Chu [guest] <guest at="" bioconductor.org=""> wrote: > > Hi, > > I am looking to use DESeq to analyze my 18-29 nt small RNA libraries. Is it appropriate to prepare a countDataSet that counts "transcripts" , or possibly "features" in this case? What do you recommend? > > Thank you! > > -- output of sessionInfo(): > > sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] easyRNASeq_1.8.1 ShortRead_1.20.0 Rsamtools_1.14.1 GenomicRanges_1.14.3 > [5] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biostrings_2.30.1 > [9] XVector_0.2.0 IRanges_1.20.5 edgeR_3.4.0 limma_3.18.3 > [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 BiocGenerics_0.8.0 > [17] intervals_0.14.0 > > loaded via a namespace (and not attached): > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 DBI_0.2-7 > [5] genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 > [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 > [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > [17] tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0 >> > > -- > Sent via the guest posting facility at bioconductor.org.

score 0 · Answer 2 · 2013-11-20

Hej Vicky! I get it now. I?ve had a look at the file and that?s how it looks like: 2L . miRNA_primary_transcript 243035 243141 . - . ID=MI0005821;Alias=MI0005821;Name=dme-mir-965 2L . miRNA 243055 243076 . - . ID=MIMAT0005480;Alias=MIMAT0005480;Name=dme- miR-965-3p;Derives_from=MI0005821 2L . miRNA 243096 243118 . - . ID=MIMAT0020861;Alias=MIMAT0020861;Name=dme- miR-965-5p;Derives_from=MI0005821 What exactly are you interested in, the miRNA only or both miRNA and pri-miRNA? If you want both, it?s going to be more tricky as you will have multiple counting indeed - the pri-miRNA containing both miRNA. The gff would anyway need reformatting. It should look like: 2L . exon 243055 243076 . - . ID=MIMAT0005480:1;Alias=MIMAT0005480;Name=dme- miR-965-3p;Parent=MI0005821 2L . exon 243096 243118 . - . ID=MIMAT0020861:1;Alias=MIMAT0020861;Name=dme- miR-965-5p;Parent=MI0005821 This can easily be done using the genomeIntervals package: library(genomeIntervals) gff <- readGff3(?dmel.gff3?) ## select only miRNA gff <- gff[gff$type == ?miRNA? ] ## change the third column gff$type=?exon? ## add a ?:exon.number? to the ID. exon.number is always 1 here as we do not expect several exon per miRNA annotation(gff)$gffAttributes <- sub(?;Alias?,?:1;Alias?,annotation(gff)$gffAttributes) ## replace Derives_from with Parent annotation(gff)$gffAttributes <- sub(?Derives_from?,?Parent?,annotation(gff)$gffAttributes) ## write it out writeGff3(gff,"dmel-miRNA-for-easyRNASeq.gff3?) I haven?t tested that code, so there may be typos. But if there are it should not be far off ;-) Then you could call easyRNASeq with count=exons. Before that, to test your modified gff file compatibility with easyRNASeq you could do: easyRNASeq:::.getGffRange(filename="dmel-miRNA-for-easyRNASeq.gff3?) and that should return a RangedData object. Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On 20 Nov 2013, at 01:00, Victoria Church <victoriaachurch at="" gmail.com=""> wrote: > Hi, > > Thanks for your response. For the input, I have bam files of short reads aligned to the drosophila melanogaster genome (dm3) as well as bam files of short reads aligned to miRbase. Ideally, I would be able to use easyRNASeq to make Count Data Sets for the bamfiles aligned to miRbase with the miRbase annotation file for drosophila, (dme.gff3, found here: ftp://mirbase.smith.man.ac.uk/pub/mirbase/CURRENT/genomes). I was able to use it fine for the files aligned to the genome, but I have not quite figured out if it will be possible to reformat the gff file of miRbase to be recognized by easyRNASeq. > > In the case of "transcripts" if you define mono-exonic transcript, does that mean that you will only count each read once? > > Thank you, > Vicky > > > On Mon, Nov 18, 2013 at 6:49 AM, Nicolas Delhomme <delhomme at="" embl.de=""> wrote: > Hej Vicky! > > The very short answer is that it depends on the content of your annotation. I can?t really tell you more without knowing more about the kind of data you want to provide as input to easyRNASeq, both for the read alignments and as annotation. In my view of things using ?features? would be the more appropriate, but that?s just a name. If you define mono-exonic transcript and use ?transcripts?, the results would be the same. > > Now, with regards to miRNA, I assume that easyRNASeq could get you a count table, but the seldom time I?ve worked with miRNA, I directly got an occurrence count table as output, e.g. form the UEA workbench. If you would detail ab it more your data processing, I could tell you more accurately if there would be any caveats in using easyRNASeq for that. > > Finally, I?ve never - so far - tried a differential analysis approach with miRNA data, and I would think that DESeq, DESeq2, edgeR, etc.. are valid tools for that but maybe others from the list can chime in? > > Cheers, > > Nico > > --------------------------------------------------------------- > Nicolas Delhomme > > Genome Biology Computational Support > > European Molecular Biology Laboratory > > Tel: +49 6221 387 8310 > Email: nicolas.delhomme at embl.de > Meyerhofstrasse 1 - Postfach 10.2209 > 69102 Heidelberg, Germany > --------------------------------------------------------------- > > > > > > On 18 Nov 2013, at 05:12, Vicky Chu [guest] <guest at="" bioconductor.org=""> wrote: > > > > > Hi, > > > > I am looking to use DESeq to analyze my 18-29 nt small RNA libraries. Is it appropriate to prepare a countDataSet that counts "transcripts" , or possibly "features" in this case? What do you recommend? > > > > Thank you! > > > > -- output of sessionInfo(): > > > > sessionInfo() > > R version 3.0.2 (2013-09-25) > > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] easyRNASeq_1.8.1 ShortRead_1.20.0 Rsamtools_1.14.1 GenomicRanges_1.14.3 > > [5] DESeq_1.14.0 lattice_0.20-24 locfit_1.5-9.1 Biostrings_2.30.1 > > [9] XVector_0.2.0 IRanges_1.20.5 edgeR_3.4.0 limma_3.18.3 > > [13] biomaRt_2.18.0 Biobase_2.22.0 genomeIntervals_1.18.0 BiocGenerics_0.8.0 > > [17] intervals_0.14.0 > > > > loaded via a namespace (and not attached): > > [1] annotate_1.40.0 AnnotationDbi_1.24.0 bitops_1.0-6 DBI_0.2-7 > > [5] genefilter_1.44.0 geneplotter_1.40.0 grid_3.0.2 hwriter_1.3 > > [9] latticeExtra_0.6-26 LSD_2.5 RColorBrewer_1.0-5 RCurl_1.95-4.1 > > [13] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2 survival_2.37-4 > > [17] tools_3.0.2 XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.8.0 > >> > > > > -- > > Sent via the guest posting facility at bioconductor.org. > >