Entering edit mode
delhomme@embl.de
★
1.2k
@delhommeemblde-3232
Last seen 10.3 years ago
Dear Darrell,
We (Vincent Zimmern and me) are currently implementing this
functionality. By March 18th, (the next Bioc release development
deadline) we should have it integrated in the new easyRNASeq version
(1.6.x). As part of that development, we have(are) developed(ing) unit
tests that actually reproduce that functionality. Once sufficiently
confident that it works satisfactorily, I could give you that excerpt
of code, so that you don't need to wait for the 1.6.x release, early
April.
Please, don't hesitate to remind me about it in the next weeks.
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On Feb 28, 2013, at 5:53 PM, Bayles, Darrell wrote:
> Dear Nico,
>
> I've read a number of posts in different forums (including BioC)
from people desiring to adapt annotations in order to deal with
overlapping synthetic exons. You indicated in this forum (Wed Jan 9,
2013) that you were working on an example, in the easyRNASeq developer
version, on how to perform this type of adaptation of an annotation.
Similarly, I would like to remove the overlaps from an annotation that
I'm working with, and have been stymied in my efforts to perform that
modification of the annotation. Has that functionality been
committed to the development release of easyRNASeq, or can you provide
an example of the R workflow needed to remove the overlaps in a gene
model computed by easyRNASeq?
>
>> rnaSeq<-easyRNASeq(
> + organism="Btaurus",
> + annotationMethod="gtf",
> + annotationFile="ensembl.gtf",
> + gapped=TRUE, count="genes",
> + summarization="geneModels",
> + pattern="*_B.bam$",
> + filesDirectory=".",
> + outputFormat="RNAseq")
> Checking arguments...
> Fetching annotations...
> Read 478833 records
> Computing gene models...
> Summarizing counts...
> Processing 733_H_0_B.bam
> Updating the read length information.
> The alignments are gapped.
> Minimum length of 1 bp.
> Maximum length of 51 bp.
> Processing 736_H_0_B.bam
> Updating the read length information.
> The alignments are gapped.
> Minimum length of 1 bp.
> Maximum length of 51 bp.
> Preparing output
> Warning messages:
> 1: In easyRNASeq(organism = "Btaurus", annotationMethod = "gtf",
annotationFile = "genes.gtf", :
> Your organism has no mapping defined to perform the validity check
for the UCSC compliance of the chromosome name.
> Defined organism's mapping can be listed using the 'knownOrganisms'
function.
> To benefit from the validity check, you can provide a 'chr.map' to
your 'easyRNASeq' function call.
> As you did not do so, 'validity.check' is turned off
> 2: In .Method(..., deparse.level = deparse.level) :
> number of columns of result is not a multiple of vector length (arg
35)
> 3: In easyRNASeq(organism = "Btaurus", annotationMethod = "gtf",
annotationFile = "genes.gtf", :
> There are 410 synthetic exons as determined from your annotation
that overlap! This implies that some reads will be counted more than
once! Is that really what you want?
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] easyRNASeq_1.4.2 ShortRead_1.16.4
latticeExtra_0.6-24
> [4] RColorBrewer_1.0-5 Rsamtools_1.10.2 DESeq_1.10.1
> [7] lattice_0.20-13 locfit_1.5-8 BSgenome_1.26.1
> [10] GenomicRanges_1.10.6 Biostrings_2.26.3 IRanges_1.16.6
> [13] edgeR_3.0.8 limma_3.14.4 biomaRt_2.14.0
> [16] Biobase_2.18.0 genomeIntervals_1.14.0
BiocGenerics_0.4.0
> [19] intervals_0.13.3
>
> loaded via a namespace (and not attached):
> [1] annotate_1.36.0 AnnotationDbi_1.20.3 bitops_1.0-5
> [4] DBI_0.2-5 genefilter_1.40.0 geneplotter_1.36.0
> [7] grid_2.15.2 hwriter_1.3 RCurl_1.95-3
> [10] RSQLite_0.11.2 splines_2.15.2 stats4_2.15.2
> [13] survival_2.37-2 tools_2.15.2 XML_3.95-0.1
> [16] xtable_1.7-1 zlibbioc_1.4.0
>
> Any help is greatly appreciated.
>
> Darrell
>
> ==========================================
> Darrell O. Bayles, M.S., Ph.D.
> USDA, ARS, National Animal Disease Center
> Infectious Bacterial Diseases Research Unit
> 1920 Dayton Ave, Bldg 24
> P.O. Box 70
> Ames, IA 50010
> Tel: (515) 337-7165
> Fax: (515) 337-7002
> ==========================================
>
>
>
>
>
> This electronic message contains information generated by the USDA
solely for the intended recipients. Any unauthorized interception of
this message or the use or disclosure of the information it contains
may violate the law and subject the violator to civil or criminal
penalties. If you believe you have received this message in error,
please notify the sender and delete the email immediately.
>