Hello,
I am new to mRNAseq data analysis and I would like to know how to run deseq2 for my data. (I googled it and I saw some posts like DESEq2 Paired samples Before and after treatment , but I didn't understand it. I am sorry to ask a basic question and take your time...)
Here is my data. (6 patient data and they are paired. They have the same disease and took the same drug. I would like to know different gene expression between before and after condition.)
> coldata
samples condition
1_3_12_clamp_0_buffy 1_3_12 before
1_3_12_clamp_200_buffy 1_3_12 after
9_clamp_0_buffy 9 before
9_clamp_200_buffy 9 after
010_0_buffy 10 before
010_200_buffy 10 after
11_15_006_0m_buffy 11_15_006 before
11_15_006_200m_buffy 11_15_006 after
12_27_11_clamp_buffy_0 12_27_11 before
12_27_11_clamp_buffy_200 12_27_11 after
013_clamp_0m_buffy 013 before
013_clamp_200m_buffy 013 after
> head(txi.rsem$counts) # I removed some of columns for a better look.
1_3_12_clamp_0_buffy | 1_3_12_clamp_200_buffy | 9_clamp_0_buffy | 9_clamp_200_buffy | 010_0_buffy | 010_200_buffy | |
ENSG00000000003 | 1 | 1 | 1 | 3 | 0 | 0 |
ENSG00000000005 | 0 | 0 | 0 | 0 | 1 | 0 |
ENSG00000000419 | 470 | 528 | 388 | 563 | 36 | 180 |
ENSG00000000457 | 306.02 | 243.22 | 254.25 | 273.97 | 248 | 186 |
ENSG00000000460 | 398.98 | 484.78 | 262.75 | 307.03 | 62 | 96 |
ENSG00000000938 | 3221 | 2756 | 1750 | 1920 | 17516 | 7642 |
I used the rsem output for tximport and ran deseq2 and I got this result.
>ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = coldata, design = ~samples + condition)
>dds <- ddsTxi[ rowSums(counts(ddsTxi)) > 1, ]
>dds <- DESeq(dds)
>res <- results(dds, pAdjustMethod = "fdr")
> res
log2 fold change (MLE): condition before vs after
Wald test p-value: condition before vs after
DataFrame with 26959 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 1.5229213 0.29551456 3.0266631 0.09763709 0.9222205 NA
ENSG00000000005 0.9538215 1.13808315 3.0745585 0.37016149 0.7112622 NA
ENSG00000000419 265.9276159 0.10175892 0.6761213 0.15050394 0.8803670 0.9987058
ENSG00000000457 230.5449617 -0.09454739 0.3944541 -0.23969175 0.8105692 0.9987058
So, I would like to know what I did is a right way of analysis my data?
Sorry for my English and thank you in advance.
JK
Dear Michael,
Thank you for your comment! I am glad what I did is not wrong. And I have some questions to ask if you don't mind. (Sorry again for wasting your time, but hopefully it might help for some new users.)
1. So as the reults shows this is comparison between before and after samples, which means 6 biological replicates for before samples and after samples. Am I right?
2. About factor levels. I saw the deseq2 vignette and if I want to know the difference between "9" and "1_13_2", my code should be ```res <- results(dds, contrast=c("samples", "9", "1_3_12"))``` which means two biological replicates for "9" samples and "1_13_12" samples. Am I right?
Thank you again in advance.
JK
I just meant that it’s more standard to have after vs before. You have before vs after. But it’s up to you how to present the comparisons.