Entering edit mode
delhomme@embl.de
★
1.2k
@delhommeemblde-3232
Last seen 10.3 years ago
Hi Meritxell,
Sorry, I forgot to answer that email. I've put the Bioc mailing list
in Cc as it might be of interest to others.
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On Oct 16, 2012, at 5:01 PM, Meritxell Oliva wrote:
> Hi Nicolas,
>
> How does easyRNASeq handles the multiple mapping reads in the *.bam
files? I observe that at least a trimming treatment is applied to
mapped reads when needed (look at log below)
easyRNASeq does not handle multiple mapping reads per se. If you
provide a file that contains multiple mapping, it will just extract
the read alignment information, so de-facto taking advantage of
multiple mapping. It's something I'm working on as well at the moment
(i.e. using such tools to get more power out of my analyses), so take
the above sentence with a grain of salt as the way multiple alignments
are reported certainly depends on the aligner being used; so such
information might be overseen for some tools at the moment. If you
tell me which tool you are using and how multiple mapping reads are
specified in the bam output I can tell you how easyRNASeq is going to
behave.
There's no trimming occurring in easyRNASeq, that message is indeed
misleading. It simply means that your bam file contains reads of
variable size, either because they have been trimmed prior to the
alignment or the aligner software used returned gapped alignments,
i.e. reads were split in 2 (or more) fragments.
>
> > CEU.RNASeq.geneModel <- easyRNASeq(outputFormat="RNAseq",organism=
"HSapiens",count="genes",summarization="geneModels",pattern=".*\\.bam$
",filesDirectory="../Montgomery2010/bam_hg19/",annotationFile="Homo_sa
piens.GRCh37.68.gtf",annotationMethod="gtf");
> Checking arguments...
> Fetching annotations...
> Read 2156097 records
> Computing gene models...
> Summarizing counts...
> Processing E-MTAB-197.BAM.ERR009096.bam
> Updating the read length information.
> The reads have been trimmed.
> Minimum length of 36 bp.
> Maximum length of 38 bp.
>
> so, I wonder which are the additional filters + treatments that are
applied.
There's no filter/treatment.
>
> I do have some additional questions: what is the best way to assign
reads to transcripts uniquely?
That's a good question. It's complex because of the presence of
isoforms (without talking about paralogs / gene family). Using
multiple mapping read is a start. Then if you are interested in
differential isoform expression, I would rather get a set of gene
models (disjoint to get unique and shared exonic regions as discussed
below) and run something like DEXSeq to get differential exon usage.
Then based on this information, go backwards and try to identify which
isoform(s) that particular exon is a member of. In my opinion trying
to summarize per isoforms is a much more complex approach and the
likelyhood to get false positive much higher. But again that's just my
personal opinion and a description of the approach I would use
(provided I've the correct kind and amount of data).
> I've already assigned reads to genes uniquely
[OBJECT.GENE=easyRNASeq(counts=gene,summarization=geneModels,etc...]
Is it possible to use the RNASeq object containing the synthetic exons
obtained by geneModel(OBJECT.GENE) for that purpose?
Yes, indeed.
> If so, which are the functions to be run?
geneModel() to get the geneModel out of the RNAseq object. Then using
GenomicRanges/IRanges function to modify that object. I've been
planning for a while now to add this as a use case in the vignette,
but hardly find the time to do so.
> I've had a look at the manual:
>
> create transcript specific synthetic exons, i.e. features that would
be unique to a transcript. To offer this possibility, transcripts
count can be summarized from "features", in which case the annotation
object need to have both the "feature" and "transcript" fields
defined.
>
> , but it's not clear in my mind how to do it. Could you provide a
further explanation?
As said, I've been wanting to implement a use case for that. If you're
willing to share with me your RNAseq object, we could do that
together. Just let me know.
>
> Last but not least: the syntetic exon dataset seems not to be
exactly what I am looking for: reads are assigned uniquely to exonic
regions, which are created by merging overlapping exons. I think it
would be interesting to disjoin the exons and have the reads assigned
to unique exonic regions and shared exonic regions.
That's what happen, overlapping exons are disjointed into non-
overlapping and overlapping synthetic exons. I'm currently working on
that with the Bioconductor developer to come up with a single
solution, based on the GenomicRanges summarizeOverlap function.
> Is there a way/mode i to achieve that using asn easyRNASeq
implemented model? I think it would be equivalent to what u achieve by
using disjoin() function provided by IRange.
Yes, it is. And that's what we are trying to come to, a unique set of
function for this.
Hope this helps shed some light on what's happening in easyRNASeq.
Cheers,
Nico
>
> Thanks
>
>
> On Oct 12, 2012, at 9:36 AM, Nicolas Delhomme wrote:
>
>> Hi Meritxell,
>>
>> I've Cced the Bioc Mailing list in case this is of interest to
others.
>>
>> On Oct 11, 2012, at 6:15 PM, Meritxell Oliva wrote:
>>
>>> Hi Nicolas,
>>>
>>> I am an easyRNASeq "newbie" user.
>>>
>>> First of all, congratulations for the development of the pipeline:
so far it's one of the best R libraries I have found to deal with
RNASeq data, as it tries to tackle problematic issues such as unique
read-exon count assignment and also wraps the normalization packages
(DESeq, edgeR), so you get all you need in one go. Thanks!
>>>
>>
>> Thanks, that's nice to hear. Let me know as well whenever your
encounter problems of think of new features!
>>
>>> I do have a question: I would like to create a non-redundant,
synthetic exon dataset, using the Ensembl68 gene models. From what I
understand from the manual, when using easyRNASeq() if you summarize
your counts by counts=gene,summarization=geneModels, this synthetic
exon dataset is generated in order to create unique read-exon
correspondances. This is what I do, and I store the object as RNASeq
object, to preserve the genomic annotation. However, the annotation
that I get if I apply the function genomicAnnotation() to this object,
is the original one from Ensembl, with redundant exons shared between
transcripts. I would like to get the synthetic exon dataset, to select
unique coding regions for each gene transcript.
>>>
>>> How can I get this dataset? My ultimate goal is to perform gene
expression differential analysis at gene, transcript and exon level.
First one is solved, and I want to find the best way to do perform the
latter ones.
>>>
>>
>> At the moment it's still a dual step process, but I plan on making
that easier. You first need to run
easyRNASeq(counts=gene,summarization=geneModels,etc...) and asking to
get an "RNAseq" outputFormat: rnaSeq <- easyRNASeq(counts=gene,summari
zation=geneModels,outputFormat="RNAseq",etc...). This will give you an
object of the class RNAseq that contains the geneModel annotation
accessible through geneModel(rnaSeq). That's a RangedData object
containing the synthetic exon, although it is still redundant for
genes located on opposite strands. So if you're not using stranded
RNAseq data, you need to do some more filtering.
>>
>>> Can you help me?
>>>
>>
>> Hope this did, let me know if not,
>>
>> Nico
>>
>>> Thanks
>>>
>>>
>>> Meritxell Oliva
>>> PhD student
>>> IBB (Biotechnology and Biomedicine Institute)
>>> Comparative and Functional Genomics group
>>> Campus Universitari - 08193 Bellaterra Cerdanyola del Vall?s -
Barcelona
>>>
>>>
>>>
>>>
>>
>
> Meritxell Oliva
> PhD student
> IBB (Biotechnology and Biomedicine Institute)
> Comparative and Functional Genomics group
> Campus Universitari - 08193 Bellaterra Cerdanyola del Vall?s -
Barcelona
>
>
>
>