The authors of proActiv have redirected my enquiry on github to this forum, I am therefore copying it here:
While investigating the proActiv
results for my sample dataset I came across the fact that many genes do not have a "Major" promoter, yet they tend to have multiple Minor promoters.
This was a bit unexpected as example workflow states the following (emphasis mine):
Promoters are also categorized into three classes. Promoters with activity < 0.25 are classified as inactive, while the most active promoters of each gene are classified as major promoters. Promoters active at lower levels are classified as minor promoters.
I now realise that this is related to the other statement described in limitations section:
proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).
Which makes sense. Looking at the source code, I believe this limitation is implemented as internalPromoter
column in the output of proActiv
.
In the actual implementation, specifically these lines, the "Major/Minor" classification is assigned before filtering out the internal promoters though.
In cases where an internalPromoter
has higher _activity_ than any non internal promoter, this would result this _internal_ promoter being assigned the Major
tag in the code. This assignment would be overwritten with NA
immediately, but no other promoter being selected as Major
leaving only Minor
promoters and NA
s.
- Is this expected?
- Shouldn't one of the otherwise
Minor
promoters that are notinternalPromoter
be assigned theMajor
label in these cases?