Hello! I am performing my differential expression RNA-Seq analysis with DESeq2.
I have the following design, with three conditions
ID_seq ID_animal Condition
C8DRYANXX_4_22 LUFO209 FO
C8DRYANXX_4_25 LUFO219 FO
C8DRYANXX_5_14 LUFO169 FO
C8DRYANXX_6_18 LUFO177 FO
C8DRYANXX_4_23 LUFO218 Control
C8DRYANXX_5_15 LUFO171 Control
C8DRYANXX_6_20 LUFO181 Control
C8DRYANXX_6_21 LUFO197 Control
C8F23ACXX_7_20 LUFO181 Control
C8DRYANXX_4_27 LUFO238 LU
C8DRYANXX_5_16 LUFO173 LU
C8DRYANXX_6_19 LUFO179 LU
C8EB2ANXX_5_27 LUFO238 LU
C8F23ACXX_7_19 LUFO179 LU
C8F23ACXX_8_13 LUFO163 LU
HHGFTBBXX_6_11 LUFO215 LU
HHGFTBBXX_6_12 LUFO234 LU
And the PCA of my data:
When I perform the analysis like this:
>dds <- DESeqDataSetFromMatrix(DE_genesCondition, colData, design = ~Condition)
>DESeq.dsCollapsed <- collapseReplicates( dds, groupby = dds$ID_animal)
>DESeq.dsCollapsed <-DESeq(DESeq.dsCollapsed)
And, I obtain the following results:
FOvsControl: 37 differentially expressed genes (DEG)
LUvsControl: 2515 DEG
LUvsFO: 817 DEG
However, when I perform the analyses independently, that is, indicating in the colData dataframe only the samples within the different contrast (for example, only Control and LU samples) and running DESeq separately three times, I obtain these results:
FOvsControl: 237 DEG
LUvsControl: 1992 DEG
LUvsFO: 672 DEG
As it can be seen, the results change from one to another approach. And the first thing that draws my attention is the high increase of the DEG in FOvsControl, that could be due to the reduction of the dispersion caused by the LU samples when you run the DESeq function in the first approach. However, in order to make the analysis of my experiment, I do not know which of these two approaches is the most correct. Could anyone help me?