Dear Sir / Madam,
Greetings!
I am a PhD student, in an academic institute of India. I have the following
queries:
1) We can use expected count for calculation of log fold change, and
log counts per million using edgeR package of R. In this package library
size is calculated, which represents sum of expected count of each contigs.
Kindly help me to understand calculation of normalization factors.
2) Multiplying the normalization factor with library size we get
effective library size, and after that this effective library size is used
for the calculation of normalized expected count. Kindly help me to
understand how normalized expected count is calculated?
3) Kindly also explain how the TMM_normalized FPKM is calculated?
4) Here is a example data, you are kindly
requested to calculate the normalization factor for effective library size,
normalized expected count, log fold change, log counts per million and TMM
normalized FPKM. I feel I can grasp easily from the calculation done in the
for this calculation.
Sorry have to post this example data here only because unable to attach excel sheet.
Example data:
Matrix of expected count | ||
Sample A | Sample B | |
c989_g1_i1 | 457 | 134 |
c1001_g1_i1 | 482 | 117 |
c997_g1_i1 | 3 | 16 |
Matrix information after analysis | |||
group | lib.size | norm.factors | eff.lib.size |
Sample A | 942 | 1.016076654 | 957.1442 |
Sample B | 267 | 0.984177716 | 262.7755 |
I want to understand how this norm.factors are calculated?
Results of edgeR from the matrix | ||||
logFC | logCPM | PValue | FDR | |
c997_g1_i1 | 4.193418634 | 14.56202158 | 0.000305695 | 0.000917085 |
c1001_g1_i1 | -0.177541924 | 18.86089555 | 0.799344988 | 0.888664958 |
c989_g1_i1 | 0.094905219 | 18.91100806 | 0.888664958 | 0.888664958 |
Kindly help me to understand how normalised expected count is calculated?
Information for calculation of TMM_normalized FPKM | |||
group | lib.size | norm.factors | eff.lib.size |
Sample A | 3505040 | 1.041236155 | 3649574 |
Sample B | 3399608 | 0.960396924 | 3264973 |
For the calculation of TMM normalised FPKM, how this normalisation factor is calculated?
After that we get TMM normalised FPKM, which is
TMM_normalized_FPKM | ||
Sample A | Sample B | |
c989_g1_i1 | 412316.07 | 440363.64 |
c1001_g1_i1 | 1171119.49 | 1035458.31 |
c997_g1_i1 | 5958.79 | 115757.58 |
To process some of my transcriptome data of doctoral study, you are kindly
requested to please clarify the aforementioned doubts. I shall be highly
grateful for this help.
I look forward your response.
With regards
Ashish Kumar Pathak
Thank you sir,
For your kind help.
Research article suggested by you was very helpful for understanding of log fold change, log CPM and normalisation factor. Now these things are clear to me.
But require little bit more clarification regarding calculation of TMM normalised FPKM. I have concept of FPKM, but I am confused regarding calculations which are performed in edgeR for the calculation of TMM normalised FPKM, if we assume we have two samples, for the calculation of FPKM we are providing effective length of only one sample which it will consider as a test sample and other as a reference sample and after that it will calculate FPKM. In RSEM, for the calculation of FPKM they first calculate TPM and after that FPKM is calculated considering the average transcript length of library.
I am trying to understand what exact formula is used for calculation of FPKM in edgeR considering length of only test sample only and how that length is used for reference sample, if we consider effective length then it will vary among samples.
Sorry sir, as I am new user of R, so unable to decode every steps followed in it.
Your kind suggestion will be highly helpful for my understanding of the subject. If possible suggest research article I should follow for understanding of the concept.
With regards.
The length of a gene should be constant between libraries, as you should be counting reads across the same features in each library. It doesn't make sense to test for DE between libraries if you're comparing different gene models; they will obviously be different, so the null hypothesis won't hold in the first place.