Hello,
I'm using QuasR package to process some RNAseq I have and count reads in exonic and intronic regions of all genes in the human genome. To this end I have aligned the reads and use qCount to carry out my counting using a GrangesList as the query. The list an entry for each gene and the entry contains either the exonic/intronic ranges. my code snippet is below:
clusters = makeForkCluster(nnodes = 8)
eCount = qCount(proj,exons,clObj = clusters)
stopCluster(cl = clusters)
However this is taking abnormally long to run, which I think is because qCount uses a for loop to loop over all elements of the list and remove redundancies using setdiff(). Is there a way that I can speed up this redundancy removal step, I have ~20000 genes (elements in the list) and the step of removing redundancies isn't complete even after ~40 hours.
I'll be grateful for any pointers.
Thanking You,
Vakul
Hi Vakul
You should probably rather use a
GRanges
query, instead of aGRangesList
.The
GRangesList
query is meant for a special analysis (see?qCount
) which partitions the genome into domains.If you want one count per exon, use a
GRanges
with exons without names or with unique names per exon. If you want one count per gene (creating a union of all exon), use aGRanges
with exons, named by genes (all exons from the same gene have the identical names).This should be much faster.
Michael