I test different condition with the following design. For example , my contrasts will be (mant vs mesenchymal_stem_cells , late_treated vs mesenchymal_stem_cells)
sample_id | condition |
mantrep1 | mant |
mantrep2 | mant |
t6rep1 | late_treated |
t6rep2 | late_treated |
t6_2015 | late_treated |
HMSC_tot | mesenchymal_stem_cells |
I have different number of samples per group ( I don't know if it could have an influence , 2samples for mant, 3 for late_treated, 1 for mesenchymal_stem_cells)
The size factors are very different (look HMSC_tot and T6rep2).
"SIZEFACTORS: " mantrep1 mantrep2 t6rep1 t6rep2 t6_2015 HMSC_tot 1.2341148 0.8325101 1.0863274 0.9813990 2.0297889 0.4530482
At the end , I retrieved 10 000 genes differentialy expressed ( |FC| > 1.5 and p-values < 0.05) when I compare MANT vs Mesenchymal_stem_cells or late_treated vs Mesenchymal_stem_cells. )
Is it not too much ?
Moreover , I found a lot of genes with | FC | > 2000 ...so something must goes wrong here !!
Can we compare samples with very different library size ? However DESeq2 model internally corrects for library size...
Note that HMSC_tot is from a different a run from 2010 (unstranded). The others come from the same run of 2015(stranded) with different library preparation from 2010.