Dear Team,
I want to perform differential expression on counts generated from featureCounts. Each family has different disease. Can we perfrom differential expression across all samples? or since disease is different from family to family, should we perform separately family by family?
Example sample information is like below
SampleID | condition | Family |
Sample1 | Normal | Fam1 |
Sample2 | Diseased | Fam1 |
Sample3 | Normal | Fam2 |
Sample4 | Diseased | Fam2 |
Sample5 | Normal | Fam3 |
Sample6 | Diseased | Fam3 |
Sample7 | Normal | Fam4 |
Sample8 | Diseased | Fam4 |
Sample9 | Normal | Fam5 |
Sample10 | Diseased | Fam5 |
I followed the below link for grouping condition and family and get multiple comparisons. Is it ok t proceed like this?
DESEq2 comparison with mulitple cell types under 2 conditions
dds <- DESeqDataSetFromMatrix(countData = counts, colData = coldata , design = ~ Family + condition)
dds$group <- factor(paste0(dds$Family, dds$condition))
design(dds) <- ~ group dds <- DESeq(dds) resultsNames(dds)
Could you please let me know your suggestions.
Thanks In Advance
Fazulur Rehaman
Dear Michael,
Thanks a lot for your quick response.
Please find below the details
What kind of genes are you looking for? Genes commonly DE in diseased relative to normal?
We are looking for more diabetic and obesity genes commonly DE in diseased relative to normal.
You cannot perform DE within family here, because there are no "replicates" (I don't know if these are human donors, or what exactly a sample refers to...).
yes, these are human donors having diabetic or obseity. Each family has one control and one diseases sample (more than 6 families). Disease might be either diabetic or obesity.
Please suggest me how can I proceed with DE.
Thanks In Advance
Fazulur Rehaman
You can use ~family + condition, which will control for family baseline, while finding genes where the diseased samples show DE relative to normal.
Dear Michael,
Thanks a lot for your suggestions.
At first, I used the same model ~family + condition
Here are the details:
which will control for family baseline, while finding genes where the diseased samples show DE relative to normal.
It means only one comparison where we can find genes diseased vs Normal, irrespective of which disease, right?
> resultsNames(dds)
[1] "Intercept" "family_Fam139_vs_Fam10"
[3] "family_Fam193_vs_Fam10" "family_Fam43_vs_Fam10"
[5] "family_Fam52_vs_Fam10" "family_Fam53_vs_Fam10"
[7] "family_Fam55_vs_Fam10" "family_Fam8_vs_Fam10"
[9] "condition_Normal_vs_Diseased"
Building Results table for diseased vs Normal condition.
In resultsNames, it was mentioned as "condition_Normal_vs_Diseased". Since we have to get genes DE in diseased relative to normal, in results() function I have given "Diseased" followed by the "Normal". Please confirm if it is ok?
And also resultsNames() function giving other possible contrasts which are "family_Fam139_vs_Fam10", `"family_Fam43_vs_Fam10" etc. How can I use them.
Please let me know your suggestions.
Thanks In Advance
Fazulur Rehaman
Take a look at the DESeq2 vignette note on factor levels, where this is discussed.
But in short, what you have above is fine, you can always specify the LFC you want using contrast. The object res1 is your results table of interest.
You do not use the other coefficients listed in resultsNames, they are nuisance coefficients for your purposes. Those coefficients control for the family baselines.
Dear Michael,
Thanks a lot for your confirmation and suggestions on factor levels.
Thanks & Regards
Fazulur Rehaman
Dear Michael,
I have one more question. Since, we have only one comparsion which is diseased relative to Normal, Is there any possibility, I might know the family belongs to upregulated or down regulated genes.
Please let me know.
Thanks & Regards
Fazulur Rehaman
You can look at plotCounts()