Entering edit mode
Milica Krunic
▴
40
@milica-krunic-5169
Last seen 10.2 years ago
Hello!
I am working with cat RNA Seq data and after mapping I wanted to get
the
count tables. So, I tried to do it using countOverlaps and
summarizedOverlaps in R and HTSeq in python. I've noticed that using
R, per
one sorted .bam file (~20*10^6 reads), no matter which previously
mentioned
method I used, it takes ~20 hours. In python, it takes ~15 minutes.
For R
methods I used GRangesList object downloaded directly in R from
Biomart. In
HTSeq I used GTF file provided on Ensembl homepage. Average cat gene
width
is about 44000 in GRangesList.
Does anyone know why getting count tables in R takes so long compared
to
HTSeq?
Thank you!
Best,
Milica
[[alternative HTML version deleted]]