Entering edit mode
Ni Feng
▴
30
@ni-feng-6726
Last seen 10.3 years ago
Dear all,
I have a general question about whether TMM normalization is
appropriate
for my data. I apologize for this long winded email. I am not a
trained
bioinformatician and therefore have been struggling with some data
analysis.
A colleague and I did an RNA seq experiment with 6 samples (each had
RNA
pooled from 6 individuals) and no biological replicates. The 6 samples
included 2 tissue types collected at 3 different time points. I know
that
this is not an ideal experimental set-up, we did this experiment 3
years
ago.
We used the Trinity package to do most of the transcriptome assembly
and
downstream analyses, such as leveraging EdgeR for differential
expression.
Naively I went on with all downstream analyses without verifying
whether my
data violated underlying assumptions of TMM normalization.
For example, we found ~30% of our transcripts showed differential
expression between any 2 pairwise comparisons. Does this violate the
TMM
assumption that most genes are NOT differentially expressed?
Furthermore, we noticed that there is still a tissue bias after
normalization. Attached is a scatterplot of TMM normalized values for
each
tissue (summed across 3 sample groups for each tissue). Plotted in
black on
top of all transcripts are CEG (Core Eukaryotic Genes) expression,
which we
believe should be good candidates for "house keeping" genes. Both CEGs
and
all genes show that at higher expression levels, there is a skew
towards
one tissue ("VMN"), whereas in the middle values, there is a skew
towards
the other tissue ("H").
I have also attached a density plot of the M values, and a MA plot to
visualize the skew. These plots were generated from 1 pair of tissue
comparisons ("SMH" vs "SMV).
These plots reflect the fact that one tissue is more heterogeneous
than the
other. Although TMM normalization is designed to deal with this
problem,
our data seems to need further normalization. Our within tissue
comparisons
are great and do not show this kind of skew. My questions are:
1) does our data violate TMM normalization assumptions
2) do you have another normalization method to suggest for our data
3) should we just forget about tissue-comparisons
I have also played around with the suggestions about estimating a
dispersion value based on the EdgeR user guide. Can discuss this
further.
Thank you for your time and patience, and any advice is much
appreciated.
--
Ni (Jenny) Ye Feng
Ph.D. Candidate
Bass Laboratory
Cornell University
Dept of Neurobiology and Behavior
Ithaca, NY 14853
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CEG_FPKM_over_all_090814.png
Type: image/png
Size: 86336 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140908="" 89d8411d="" attachment.png="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SMV_SMH_density_log2(M).pdf
Type: application/pdf
Size: 4716 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140908="" 89d8411d="" attachment.pdf="">
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SMH_SMV_MA_plot_0903.png
Type: image/png
Size: 51246 bytes
Desc: not available
URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140908="" 89d8411d="" attachment-0001.png="">