Hi Michael,
Indeed the shortcut you suggest decreases the memory usage and
computation time significantly. I guess the things could be improved
even more, had the two shortreads for the same sequence be in adjacent
positions in addition to being sorted by the starting position. In
that case eventual use.names in readGappedAlignments could be avoided
as well as the unlist-list-unlist steps which all are time consuming.
Thanks a lot for the quick responses
Adi Laurentiu TARCA, Ph.D.
Assistant Professor (Research),
Department of Computer Science & Center for Molecular Medicine and
Genetics, Wayne State University,
Director, Bioinformatics and Computational Biology Unit, Perinatology
Research Branch (NICHD),
3990 John R., Office 4809,
Detroit, Michigan 48201
Tel: 1-313-5775305
From: Tarca, Adi
Sent: Tuesday, January 17, 2012 7:43 PM
To: 'Michael Lawrence'
Cc: bioconductor@stat.math.ethz.ch
Subject: RE: [BioC] about reduce function in GenomicRanges
Hi Michael,
Thanks for the advice. Briefly, this is the code that I have so far
for dealing with paired end Illumina data. It runs ok on a chunk of
the alignment object, but on an entire thing it is taking too much
memory. A 32GB Ram 64 bit machine can't do it:
aln <- readGappedAlignments(filePath,use.names=TRUE)
strand(aln) <- "*"
aln=as(aln,"GRangesList")
nms=names(aln)
names(aln)<-NULL
aln=unlist(aln)
aln=split(aln,nms)
aln=range(aln)
The last line is the limiting step.
Any suggestion how to decrease the memory consumption here?
Thanks,
Adi
From: Michael Lawrence [mailto:lawrence.michael@gene.com]
Sent: Friday, January 13, 2012 4:23 PM
To: Tarca, Adi
Cc: Michael Lawrence; bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] about reduce function in GenomicRanges
Just split(gr, rep(names(grl), elementLengths(grl))).
On Fri, Jan 13, 2012 at 1:10 PM, Tarca, Adi
<atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote:
Thank you Michael,
I almost got it except that it is not obvious to me how once I have
the Granges object how to relist it and split on the names.
Thanks,
Adi
From: Michael Lawrence
[mailto:lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>]
Sent: Friday, January 13, 2012 1:58 PM
To: Tarca, Adi
Cc:
bioconductor@stat.math.ethz.ch<mailto:bioconductor@stat.math.ethz.ch>
Subject: Re: [BioC] about reduce function in GenomicRanges
The general idiom is to coerce the GappedAlignments to a GRangesList,
then flatten it to a GRanges, and relist it, splitting on the names.
The resulting GRangesList will yield the appropriate counts.
A little bit convoluted, but unfortunately support for paired end data
remains a gap in the infrastructure.
Michael
On Fri, Jan 13, 2012 at 10:50 AM, Tarca, Adi
<atarca@med.wayne.edu<mailto:atarca@med.wayne.edu>> wrote:
Dear list,
I was wondering if is there is a way to apply the reduce function on a
GappedAlignments object according to the levels of a factor (such as
the read ID from paired end data). I am basically trying to deal with
the instance when the GappedAlignments object was obtained by reading
a bam file obtained from a bowtie2 alignment on Illumina paired end
data. Since both ends of the read get an entry in the bam file, I do
not want to count twice the read when using later the function
countOverlaps.
Thank you,
Adi Tarca
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor@r-project.org<mailto:bioconductor@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
[[alternative HTML version deleted]]