Entering edit mode
Julie Leonard
▴
110
@julie-leonard-5222
Last seen 10.2 years ago
1) Is RNA-Seq data even appropriate for "standard" cluster analysis
due to its
discrete nature? What normalization should be done beforehand? We
tend to
perform length and TMM normalization of our data.
2) If we perform some sort of clustering of RNA-Seq data, and then
obtain a gene
list from a cluster (e.g. all genes in a cluster) and then want to
perform gene
set enrichment analysis on this gene list, is just using the Fisher's
Exact Test
by itself ok or do we need to account for gene length (e.g. use
GOSeq)? I know
that RNA-Seq data has the bias that longer genes tend to be more often
called
differentially expressed due to an increase in statistical power. The
issue
here is that longer genes --> more reads --> lower variance --> higher
power to
detect differences? I am wondering if this difference in variance
levels between
long and short genes would have an effect on the results of
clustering?
Thanks,
-Julie
Did you ever get an answer?