Hello everyone
I am using edgeR for the analysis of allele-specific expression events with mouse RNA-Seq dataset. I use featurecount to count the reads mapped to maternal/paternal allele and also those which map equally to both (i.e. Reads that don't overlap a SNP). Then I use edgeR to analyse the differential expression (maternal over paternal allele) using these counts. However I found that mostly these counts are low (since I am counting only allele-specific reads and discard reads with no SNP information).
To work around this problem, I was suggested by someone to add a proportion of "background reads" (i.e. reads with no allelic information) to the allele-specific read counts on both sides. This improved the number of differentially genes detected. In fact, addition of 50% background reads also makes the expression status of my "control genes" (mouse imprinted genes), comparable to a previously published dataset in the same cell line (where they indeed sequenced with twice the depth as ours).
However, I am unsure if my strategy is correct. How does the testing in edgeR affected if you are comparing, for example, 14 vs 12 reads, in place of 4 vs 2 reads? What's the best strategy to compute differential expression in this situation?
Thanks Aaron and Dr. Smith for your answers. Indeed most of the genes have low counts when I count only allele-specific reads and I also expect the reduction in variability by adding background reads to be the reason of improved differential expression. If this is the case, then not adding these counts and filtering low count genes (as Aaron suggested), should improve the differential expression results. I can still hope to see the high fold-change genes to be on top of the list in both cases.
Can you suggest how should I decide the count cut-off to filter these genes?