Getting counts for previously undetected transcripts and genes with easyRNASeq: comparison to Cufflinks

0

Entering edit mode

Richard Friedman ★ 2.0k

@richard-friedman-513

Last seen 10.2 years ago

Dear Bioconductor List, I am working through the easyRNAseq use case to learn how to obtain counts in RNASeq experiments for further analysis. The example starts with BAM files and converts BAM files to Exons, transcripts and genes using geneModels. Does using geneModels only assemble previously annotated transcripts and genes OR can it find new ones if present? If it can find new ones how well does it do this in comparison to Cufflinks? If it cannot find new ones - is there a way to get counts (as distinct from fpkm values) for genes and transcripts from cufflinks and relate them to existing annotation where they correspond and present them as non-previosuly annotated ones when they don't correspond? Can easyRNASeq be used for this purpose? Can anyone recommend a tool that can be used for this purpose? My goal is to get a set of counts that can be input into CQN and then edgeR. I wish to use TopHat/Cufflinks to get the Exons, transcripts, and genes including novel spliced variants but I am persuaded CQN is a better way to normalize than FPKM and edgeR is a better way to analyze differential expression than Cuffdiff. I would appreciate any advice. Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Initiative in Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ In memoriam, Ray Bradbury [[alternative HTML version deleted]]

RNASeq Annotation Cancer edgeR cqn easyRNASeq RNASeq Annotation Cancer edgeR cqn • 2.5k views

ADD COMMENT • link updated 12.0 years ago by delhomme@embl.de ★ 1.2k • written 12.0 years ago by Richard Friedman ★ 2.0k

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 weeks ago

Icahn School of Medicine at Mount Sinai…

One trick I have used for incompletely-annotated genomes is to run my sequencing data through tophat & cufflinks to get a set of transcripts that covers all the mapped reads (while also providing Cufflinks with whatever annotation is already available. Then I use cuffmerge to both merge transcripts from multiple samples *and* associate Cufflinks transcript IDs (e.g. "XLOC_000001", etc.) with their corresponding IDs in the prior annotation. Then I can do my read counting on the Cufflinks assembly, but still get meaningful gene IDs for annotated genes. On Mon 03 Dec 2012 01:03:49 PM PST, Richard Friedman wrote: > Dear Bioconductor List, > > I am working through the easyRNAseq use case to learn how > to obtain counts in RNASeq experiments for further analysis. > The example starts with BAM files and converts BAM files to Exons, > transcripts and genes using geneModels. > > Does using geneModels only assemble previously annotated transcripts and > genes OR can it find new ones if present? > > If it can find new ones how well does it do this in comparison to Cufflinks? > > If it cannot find new ones - is there a way to get counts (as distinct from fpkm > values) for genes and transcripts from cufflinks and > relate them to existing annotation where they correspond and present them > as non-previosuly annotated ones when they don't correspond? > > Can easyRNASeq be used for this purpose? > Can anyone recommend a tool that can be used for this purpose? > > My goal is to get a set of counts that can be input into CQN and then edgeR. > I wish to use TopHat/Cufflinks to get the Exons, transcripts, and genes including > novel spliced variants but I am persuaded CQN is a better way to normalize than > FPKM and edgeR is a better way to analyze differential expression than > Cuffdiff. > > I would appreciate any advice. > > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.0 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

delhomme@embl.de ★ 1.2k

@delhommeemblde-3232

Last seen 10.2 years ago

Hi Richard, On Dec 3, 2012, at 10:03 PM, Richard Friedman wrote: > Dear Bioconductor List, > > I am working through the easyRNAseq use case to learn how > to obtain counts in RNASeq experiments for further analysis. > The example starts with BAM files and converts BAM files to Exons, > transcripts and genes using geneModels. > > Does using geneModels only assemble previously annotated transcripts and > genes OR can it find new ones if present? It can only assemble gene models from the exons defined in the annotation file you provide. > > If it can find new ones how well does it do this in comparison to Cufflinks? > No it can't. There's a "now old" functionality - not exposed - that identify islands of coverage, i.e. in an approach similar to a very basic approach at finding large ChIP-Seq regions (e.g. from histone marks). But that would compare very poorly with cufflinks as it does not take into account any splice-junction nor paired-read information. > If it cannot find new ones - is there a way to get counts (as distinct from fpkm > values) for genes and transcripts from cufflinks and > relate them to existing annotation where they correspond and present them > as non-previosuly annotated ones when they don't correspond? > Yes there is - I'm doing something very similar - but not in R. More below. > Can easyRNASeq be used for this purpose? Not directly no. > Can anyone recommend a tool that can be used for this purpose? More below. > > My goal is to get a set of counts that can be input into CQN and then edgeR. > I wish to use TopHat/Cufflinks to get the Exons, transcripts, and genes including > novel spliced variants but I am persuaded CQN is a better way to normalize than > FPKM and edgeR is a better way to analyze differential expression than > Cuffdiff. > I agree with you there. What you might want to consider in addition is to use an aligner that is multi-mapping aware. RSEM is an example, tophat/cufflinks does this in a similar way by default (or so I believe, double-check that this option is really enabled by default). Here it's useful to use bowtie2 rather than bowtie. Finally, I like the idea of using CQN, I'll consider adding this to the easyRNASeq "pipeline". As I said, I have a very similar setup, but completely de-novo. I've been (still am) testing several approaches: 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the exon/gene GFF) and from that I go back to the original alignments by tophat and use these as input together with the GFF for easyRNASeq. I then get my DESeq/edgeR output and proceed in R. 2) In parallel, I'm doing the same using trinity (assembling the transcriptome de novo) and re-aligning the read libraries using RSEM. From the RSEM results, I get directly the count tables that I extract as a matrix and directly input into edgeR/DESeq (bypassing easyRNASeq). 3) Same as 2, but I create a GFF annotation by performing a re- alignments of the obtained contig to the genome assembly using GMAP. Then using the RSEM bam files, I use easyRNASeq to create the count table. I'm still evaluating the outcome of these different approaches. As it seems that you already have some annotation for your genome/transcriptome, that should ease some of the steps, e.g. instead of running cuffmerge, you could run cuffcompare to refine your GFF (see my approach #1). Since this discussion is not really related to Bioconductor, we could continue it offline if you like. HTH, Cheers, Nico > I would appreciate any advice. > > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.0 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote: > > > As I said, I have a very similar setup, but completely de-novo. I've been (still am) testing several approaches: > > 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the exon/gene GFF) and from that I go back to the original alignments by tophat and use these as input together with the GFF for easyRNASeq. I then get my DESeq/edgeR output and proceed in R. Nico, Thanks for your definitive and quick reply. I have a question on your option 1. Cufflinks/Cuffmerge ouputs a GTF file. Can the GTF file be used instead of a GFF file in easyRNASeq? I know that the GTF format files are extensions of the GFF format files but can GTF be used in place of GFF in easyRNASeq? Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Initiative in Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ In memoriam, Ray Bradbury [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

Hi Rich, Yes it can, just use the "gtf" as the annotationMethod argument. Note that I expect the gtf file to be formatted according to that description: http://genome.ucsc.edu/FAQ/FAQformat.html#format4. There's more details about this in the easyRNASeq vignette. I haven't had any problems using the cuffmerge/cuffcompare gtfs, but it's always worth a check. Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Dec 4, 2012, at 4:54 PM, Richard Friedman wrote: > On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote: >> >> >> As I said, I have a very similar setup, but completely de-novo. I've been (still am) testing several approaches: >> >> 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the exon/gene GFF) and from that I go back to the original alignments by tophat and use these as input together with the GFF for easyRNASeq. I then get my DESeq/edgeR output and proceed in R. > > Nico, > > Thanks for your definitive and quick reply. I have a question on your option 1. > Cufflinks/Cuffmerge ouputs a GTF file. Can the GTF file be used instead of a GFF file in > easyRNASeq? I know that the GTF format files are extensions of the GFF format files but > can GTF be used in place of GFF in easyRNASeq? > > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > >

ADD REPLY • link 12.0 years ago delhomme@embl.de ★ 1.2k

0

Entering edit mode

On Dec 4, 2012, at 10:54 AM, Richard Friedman wrote: > On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote: >> >> >> As I said, I have a very similar setup, but completely de-novo. I've been (still am) testing several approaches: >> >> 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the exon/gene GFF) and from that I go back to the original alignments by tophat and use these as input together with the GFF for easyRNASeq. I then get my DESeq/edgeR output and proceed in R. > Dear Nico and list, I tried using the cuffmerge gtf file as the gff file in TopHat as you suggest and successfully generated a bam file but easyRNASeq could not read it. (Although TopHat/Cufflinks are not bioconductor programs, it is desirable that their output should be readable by bioconductor programs so that I hope that including a TopHat command does not go beyond the scope of this list). To realign the reads using the gtf file generated by cufflinks I used tophat -p 6 -G /Documents/clients/phyllis/humanlearndata/merged_asm/merged.gtf -o SRR490225_it_thout genome SRR490225.fastq where merged,gtf is the gtf file generated by cufflinks. Thene here is a record of my easyRNASeq session; ###################################################################### ######################## R version 2.15.2 (2012-10-26) -- "Trick or Treat" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-apple-darwin9.8.0/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.53 (6335) i386-apple-darwin9.8.0] [Workspace restored from /Users/friedman/.RData] [History restored from /Users/friedman/.Rapp.history] > library(easyRNASeq) Loading required package: parallel Loading required package: genomeIntervals Loading required package: intervals Loading required package: BiocGenerics Attaching package: BiocGenerics The following object(s) are masked from package:stats: xtabs The following object(s) are masked from package:base: anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Loading required package: biomaRt Loading required package: edgeR Loading required package: limma Loading required package: Biostrings Loading required package: IRanges Attaching package: IRanges The following object(s) are masked from package:intervals: expand, reduce Attaching package: Biostrings The following object(s) are masked from package:intervals: type Loading required package: BSgenome Loading required package: GenomicRanges Loading required package: DESeq Loading required package: locfit locfit 1.5-8 2012-04-25 Attaching package: locfit The following object(s) are masked from package:GenomicRanges: left, right Loading required package: lattice Attaching package: DESeq The following object(s) are masked from package:limma: plotMA Loading required package: Rsamtools Loading required package: ShortRead Loading required package: latticeExtra Loading required package: RColorBrewer Warning messages: 1: replacing previous import coerce when loading intervals 2: replacing previous import initialize when loading intervals > library(BSgenome.Hsapiens.UCSC.hg19) > library(Rsamtools) > > chr.sizes=seqlengths(Hsapiens) > chr.sizes chr1 chr2 chr3 chr4 249250621 243199373 198022430 191154276 .. > bamfiles=dir(getwd(),pattern="*\\.bam$") > bamfiles [1] "SRR490224it.bam" "SRR490225it.bam" > indexBam(SRR490224it.bam) Error in indexBam(SRR490224it.bam) : error in evaluating the argument 'files' in selecting a method for function 'indexBam': Error: object 'SRR490224it.bam' not found > sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.4.2 [3] ShortRead_1.16.2 latticeExtra_0.6-19 [5] RColorBrewer_1.0-5 Rsamtools_1.10.2 [7] DESeq_1.10.1 lattice_0.20-10 [9] locfit_1.5-8 BSgenome_1.26.1 [11] GenomicRanges_1.10.5 Biostrings_2.26.2 [13] IRanges_1.16.4 edgeR_3.0.4 [15] limma_3.12.1 biomaRt_2.14.0 [17] Biobase_2.18.0 genomeIntervals_1.14.0 [19] BiocGenerics_0.4.0 intervals_0.13.3 loaded via a namespace (and not attached): [1] annotate_1.34.1 AnnotationDbi_1.20.2 bitops_1.0-4.1 DBI_0.2-5 [5] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.2 hwriter_1.3 [9] RCurl_1.91-1 RSQLite_0.11.1 splines_2.15.2 stats4_2.15.2 [13] survival_2.36-14 XML_3.9-4 xtable_1.7-0 zlibbioc_1.2.0 > ls() [1] "bamfiles" "chr.sizes" ###################################################################### ######################## Any suggestions would be appreciated. Thanks and best wishes, Rich Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ Columbia Initiative in Systems Biology Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman@cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ In memoriam, Ray Bradbury [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago Richard Friedman ★ 2.0k

0

Entering edit mode

On Thu, Dec 13, 2012 at 10:55 AM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > > On Dec 4, 2012, at 10:54 AM, Richard Friedman wrote: > > > On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote: > >> > >> > >> As I said, I have a very similar setup, but completely de-novo. I've > been (still am) testing several approaches: > >> > >> 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the exon/gene > GFF) and from that I go back to the original alignments by tophat and use > these as input together with the GFF for easyRNASeq. I then get my > DESeq/edgeR output and proceed in R. > > > > > > > Dear Nico and list, > > I tried using the cuffmerge gtf file as the gff file in TopHat as > you suggest and successfully > generated a bam file but easyRNASeq could not read it. (Although > TopHat/Cufflinks are not bioconductor > programs, it is desirable that their output should be readable by > bioconductor programs so that > I hope that including a TopHat command does not go beyond the scope of > this list). > > To realign the reads using the gtf file generated by cufflinks I used > > tophat -p 6 -G > /Documents/clients/phyllis/humanlearndata/merged_asm/merged.gtf -o > SRR490225_it_thout genome SRR490225.fastq > > where merged,gtf is the gtf file generated by cufflinks. > > Thene here is a record of my easyRNASeq session; > > > #################################################################### ########################## > > R version 2.15.2 (2012-10-26) -- "Trick or Treat" > Copyright (C) 2012 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > [R.app GUI 1.53 (6335) i386-apple-darwin9.8.0] > > [Workspace restored from /Users/friedman/.RData] > [History restored from /Users/friedman/.Rapp.history] > > > library(easyRNASeq) > Loading required package: parallel > Loading required package: genomeIntervals > Loading required package: intervals > Loading required package: BiocGenerics > > Attaching package: âBiocGenericsâ > > The following object(s) are masked from âpackage:statsâ: > > xtabs > > The following object(s) are masked from âpackage:baseâ: > > anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, > intersect, > lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, > pmin.int, Position, > rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, > union, unique > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite > Bioconductor, see 'citation("Biobase")', and for packages > 'citation("pkgname")'. > > Loading required package: biomaRt > Loading required package: edgeR > Loading required package: limma > Loading required package: Biostrings > Loading required package: IRanges > > Attaching package: âIRangesâ > > The following object(s) are masked from âpackage:intervalsâ: > > expand, reduce > > > Attaching package: âBiostringsâ > > The following object(s) are masked from âpackage:intervalsâ: > > type > > Loading required package: BSgenome > Loading required package: GenomicRanges > Loading required package: DESeq > Loading required package: locfit > locfit 1.5-8 2012-04-25 > > Attaching package: âlocfitâ > > The following object(s) are masked from âpackage:GenomicRangesâ: > > left, right > > Loading required package: lattice > > Attaching package: âDESeqâ > > The following object(s) are masked from âpackage:limmaâ: > > plotMA > > Loading required package: Rsamtools > Loading required package: ShortRead > Loading required package: latticeExtra > Loading required package: RColorBrewer > Warning messages: > 1: replacing previous import âcoerceâ when loading âintervalsâ > 2: replacing previous import âinitializeâ when loading âintervalsâ > > library(BSgenome.Hsapiens.UCSC.hg19) > > library(Rsamtools) > > > > chr.sizes=seqlengths(Hsapiens) > > chr.sizes > > chr1 chr2 chr3 > chr4 > 249250621 243199373 198022430 > 191154276 > > â¦â¦.. > > > > bamfiles=dir(getwd(),pattern="*\\.bam$") > > bamfiles > [1] "SRR490224it.bam" "SRR490225it.bam" > > > indexBam(SRR490224it.bam) > Error in indexBam(SRR490224it.bam) : > error in evaluating the argument 'files' in selecting a method for > function 'indexBam': Error: object 'SRR490224it.bam' not found > > Put the BAM file name in quotes here. Sean > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.4.2 > [3] ShortRead_1.16.2 latticeExtra_0.6-19 > [5] RColorBrewer_1.0-5 Rsamtools_1.10.2 > [7] DESeq_1.10.1 lattice_0.20-10 > [9] locfit_1.5-8 BSgenome_1.26.1 > [11] GenomicRanges_1.10.5 Biostrings_2.26.2 > [13] IRanges_1.16.4 edgeR_3.0.4 > [15] limma_3.12.1 biomaRt_2.14.0 > [17] Biobase_2.18.0 genomeIntervals_1.14.0 > [19] BiocGenerics_0.4.0 intervals_0.13.3 > > loaded via a namespace (and not attached): > [1] annotate_1.34.1 AnnotationDbi_1.20.2 bitops_1.0-4.1 > DBI_0.2-5 > [5] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.2 > hwriter_1.3 > [9] RCurl_1.91-1 RSQLite_0.11.1 splines_2.15.2 > stats4_2.15.2 > [13] survival_2.36-14 XML_3.9-4 xtable_1.7-0 > zlibbioc_1.2.0 > > > > ls() > [1] "bamfiles" "chr.sizes" > > > #################################################################### ########################## > > Any suggestions would be appreciated. > > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman@cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 12.0 years ago Sean Davis 21k

0

Entering edit mode

On 13.12.2012 15:55, Richard Friedman wrote: > On Dec 4, 2012, at 10:54 AM, Richard Friedman wrote: > >> On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote: >>> >>> >>> As I said, I have a very similar setup, but completely de-novo. >>> I've been (still am) testing several approaches: >>> >>> 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the >>> exon/gene GFF) and from that I go back to the original alignments by >>> tophat and use these as input together with the GFF for easyRNASeq. I >>> then get my DESeq/edgeR output and proceed in R. >> > > > > > Dear Nico and list, > > I tried using the cuffmerge gtf file as the gff file in TopHat as > you suggest and successfully > generated a bam file but easyRNASeq could not read it. (Although > TopHat/Cufflinks are not bioconductor > programs, it is desirable that their output should be readable by > bioconductor programs so that > I hope that including a TopHat command does not go beyond the scope > of this list). You'll kick yourself, but it doesn't look like you put quotes around the file name? > > To realign the reads using the gtf file generated by cufflinks I used > > tophat -p 6 -G > /Documents/clients/phyllis/humanlearndata/merged_asm/merged.gtf -o > SRR490225_it_thout genome SRR490225.fastq > > where merged,gtf is the gtf file generated by cufflinks. > > Thene here is a record of my easyRNASeq session; > > > #################################################################### ########################## > > R version 2.15.2 (2012-10-26) -- "Trick or Treat" > Copyright (C) 2012 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > Natural language support but running in an English locale > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > [R.app GUI 1.53 (6335) i386-apple-darwin9.8.0] > > [Workspace restored from /Users/friedman/.RData] > [History restored from /Users/friedman/.Rapp.history] > >> library(easyRNASeq) > Loading required package: parallel > Loading required package: genomeIntervals > Loading required package: intervals > Loading required package: BiocGenerics > > Attaching package: ?BiocGenerics? > > The following object(s) are masked from ?package:stats?: > > xtabs > > The following object(s) are masked from ?package:base?: > > anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, > get, intersect, > lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, > pmin.int, Position, > rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, > union, unique > > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite > Bioconductor, see 'citation("Biobase")', and for packages > 'citation("pkgname")'. > > Loading required package: biomaRt > Loading required package: edgeR > Loading required package: limma > Loading required package: Biostrings > Loading required package: IRanges > > Attaching package: ?IRanges? > > The following object(s) are masked from ?package:intervals?: > > expand, reduce > > > Attaching package: ?Biostrings? > > The following object(s) are masked from ?package:intervals?: > > type > > Loading required package: BSgenome > Loading required package: GenomicRanges > Loading required package: DESeq > Loading required package: locfit > locfit 1.5-8 2012-04-25 > > Attaching package: ?locfit? > > The following object(s) are masked from ?package:GenomicRanges?: > > left, right > > Loading required package: lattice > > Attaching package: ?DESeq? > > The following object(s) are masked from ?package:limma?: > > plotMA > > Loading required package: Rsamtools > Loading required package: ShortRead > Loading required package: latticeExtra > Loading required package: RColorBrewer > Warning messages: > 1: replacing previous import ?coerce? when loading ?intervals? > 2: replacing previous import ?initialize? when loading ?intervals? >> library(BSgenome.Hsapiens.UCSC.hg19) >> library(Rsamtools) >> >> chr.sizes=seqlengths(Hsapiens) >> chr.sizes > > chr1 chr2 chr3 > chr4 > 249250621 243199373 198022430 > 191154276 > > ??.. > > >> bamfiles=dir(getwd(),pattern="*\\.bam$") >> bamfiles > [1] "SRR490224it.bam" "SRR490225it.bam" > >> indexBam(SRR490224it.bam) > Error in indexBam(SRR490224it.bam) : > error in evaluating the argument 'files' in selecting a method for > function 'indexBam': Error: object 'SRR490224it.bam' not found > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.4.2 > [3] ShortRead_1.16.2 latticeExtra_0.6-19 > [5] RColorBrewer_1.0-5 Rsamtools_1.10.2 > [7] DESeq_1.10.1 lattice_0.20-10 > [9] locfit_1.5-8 BSgenome_1.26.1 > [11] GenomicRanges_1.10.5 Biostrings_2.26.2 > [13] IRanges_1.16.4 edgeR_3.0.4 > [15] limma_3.12.1 biomaRt_2.14.0 > [17] Biobase_2.18.0 genomeIntervals_1.14.0 > [19] BiocGenerics_0.4.0 intervals_0.13.3 > > loaded via a namespace (and not attached): > [1] annotate_1.34.1 AnnotationDbi_1.20.2 bitops_1.0-4.1 > DBI_0.2-5 > [5] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.2 > hwriter_1.3 > [9] RCurl_1.91-1 RSQLite_0.11.1 splines_2.15.2 > stats4_2.15.2 > [13] survival_2.36-14 XML_3.9-4 xtable_1.7-0 > zlibbioc_1.2.0 > > >> ls() > [1] "bamfiles" "chr.sizes" > > > #################################################################### ########################## > > Any suggestions would be appreciated. > > Thanks and best wishes, > Rich > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet)/ > Columbia Initiative in Systems Biology > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > In memoriam, Ray Bradbury > > > > > > > > [[alternative HTML version deleted]] -- Alex Gutteridge

ADD REPLY • link 12.0 years ago Alex Gutteridge ▴ 650

Login before adding your answer.