Entering edit mode
Hi Ugo,
I can see one problem in the auxiliaries.txt file: You are referring
the
same sequence file (seq.fa) twice, so all auxiliary alignments are
done
twice.
The speed of QuasR is primarly defined by the speed of
bowtie/SpliceMap.
The fact that the alignments against chr1 are fast may be due to only
few reads producing hits on chr1, or at last much fewer hits than
against the auxiliaries.
A way to speed up the analysis would be to do unspliced alignments
(spliceAlignment=FALSE), which may not be necessary if your reporter
genes in seq.fa are spliced cDNAs. Also, it would help to use more
than
just 4 CPU cores, if you have additional hardware available to you.
Michael
On 26.04.2013 14:58, Ugo Borello wrote:
> Thank you Michael,
> The reporter genes are not part of the genome, so I am aligning my
short
> reads to the genome (actually chr1only in the following test) and to
a fasta
> file (seq.fa) containing the sequences of the two reporter genes.
> But, while alignment to chr1 was relatively fast, the alignment to
the fasta
> file it is taking forever.
> Here the code:
>
> library(QuasR)
> library(parallel)
>
> cl <- makeCluster(4)
> sampleFile <- "test.txt"
> genomeName <- "chr1.fa"
> auxFile <- "auxiliaries.txt"
>
> proj1 <- qAlign(sampleFile, genome= genomeName, auxiliaryFile=
auxFile,
> splicedAlignment=TRUE, clObj=cl)
>
>
> text.txt is formatted this way:
> FileName SampleName
> 31omo.fastq Sample1
>
>
> auxiliaries.txt is formatted this way:
> FileName AuxName
> seq.fa egfp
> seq.fa cre
>
>
> Am I missing something here? Is there a faster way to complete my
analysis?
>
> Thank you in advance for your help
>
> Ugo
>
>
>> From: Michael Stadler <michael.stadler at="" fmi.ch="">
>> Date: Fri, 26 Apr 2013 08:40:50 +0200
>> To: <bioconductor at="" r-project.org="">
>> Cc: <ugo.borello at="" inserm.fr="">
>> Subject: Re: [BioC] QuasR special case alignment
>>
>> Dear Ugo,
>>
>> One way to accomplish would be to put your two reporter genes into
a
>> fasta file, and use this as your "genome" (see ?qAlign or the
vignette
>> for examples of genomes in fasta format).
>>
>> However, I think it would be better to align against the full
genome, as
>> this will avoid certain problems, such as reads being suboptimally
>> aligned to your reporter genes which would align perfectly to some
other
>> genomic locus.
>>
>> Assuming that the reporter genes are part of the genome, you could
then
>> count alignments using qCount() with a GRanges query containing the
>> ranges of your reporter genes.
>>
>> If the reporter genes are not part of the genome, you can provide
your
>> reporter gene sequences (as a fasta file) to qAlign, and all reads
not
>> aligning to the genome will be further aligned to them. The format
of
>> the required auxiliary file is very simple and described in section
5.2
>> of the vignette.
>>
>> You can open the QuasR vignette using:
>> library(QuasR)
>> vignette("QuasR-Overview")
>>
>> Best wishes,
>> Michael
>>
>> On 25.04.2013 16:52, Ugo Borello wrote:
>>> Good morning,
>>>
>>> I am doing RNA-Seq analysis and I would like to perform an
exploratory
>>> analysis to first verify that 2 reporter genes are expressed in my
samples.
>>>
>>> Could the qAlign function perform an alignment of my short reads
with two
>>> reporter gene sequences, excluding the genome.
>>> It seems, from my first attempt, that the 'genome' parameter is
essential;
>>> is this true?
>>> Anyway how do I have to feed those sequences to qAlign and how do
I have to
>>> format my auxiliary file/files to see in the alignment statistics
the number
>>> of mapped reads for each reporter gene sequence separately?
>>>
>>> Thank you in advance for your help.
>>>
>>> Ugo
>>>
>>> P.S. Let me say that QuasR is a fantastic package!
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>
>
--
--------------------------------------------
Michael Stadler, PhD
Head of Computational Biology
Friedrich Miescher Institute
Basel (Switzerland)
Phone : +41 61 697 6492
Fax : +41 61 697 3976
Mail : michael.stadler at fmi.ch