Entering edit mode
Jonathan
▴
60
@jonathan-3868
Last seen 10.2 years ago
Hello,
I'm interested in calculating differential expression from some
paired
RNAseq samples.
I've used htseq-count after mapping; quite happy with how easy that
was.
My question is with regard to whether or not to trip the last five
rows
from htseq-count output.
Those rows look like this:
no_feature 152030
ambiguous 4876
too_low_aQual 0
not_aligned 0
alignment_not_unique 0
I can dream of reasons supporting either side of this question.. The
number
of unmapped or ambiguously-mapping reads do contribute to the total
library
size. However, I'm also interested in quantifying the difference
between
what's human in both samples, so intuition would tell me to remove
those
reads.
Because the counts are big, this matters a great deal. I'm using
EdgeR
(again, very happy with that software), and the manual cites htseq-
count as
a viable methodology, but doesn't comment on their preferred
treatment of
the unmapped reads.
My first (somewhat careless) utilization of EdgeR gave us results that
appeared to make sense, but upon digging a little deeper, I noticed
that
this question affects the p-values quite a lot because the unmapped
counts
are so big.
I would appreciate any comments/opinions!
Thanks,
Jonathan
[[alternative HTML version deleted]]