I have samples for two groups: males with disease (n=114) and females with disease (n=65) (male or female set as condition). I wanted to assess differential gene expression. My results show that a male-specific gene (SRY) has a negative log2fold change when females are set as reference meaning (if I am understanding correctly) that SRY was downregulated in males compared to females. SRY gene should only be found in males. I analyzed my gene count matrix and all the females have expression of zero for SRY however many males do as well (possibly indicative of the disease state). I have read the vignette and many posts here and see that the difference in sample size will affect the power of the results however, could the difference in group size result in so great a difference? The script I used is below. I am relatively new to R and have even less experience with DESeq2. I appreciate any thoughts!
Code should be placed in three backticks as shown below
```r gene_counts <- read.csv("gene_count_matrix.csv", header = TRUE, row.names = 1)
study_countsLabels <- read.csv("col_data.csv", header = TRUE, row.names = 1)
colData <- read.csv("col_data.csv", header=T, row.names=1, sep=",")
all(colnames(data) %in% rownames(colData))
all(colnames(data) == rownames(colData))
dds_study_gene <- DESeqDataSetFromMatrix(countData = gene_counts, colData = study_countsLabels, design = ~condition)
dds_study_gene$condition <- relevel(dds_study_gene$condition, ref = "female")
dds_gene <- DESeq(dds_study_gene)
results_DESeq2_female_vs_male_gene <- results(dds_gene, contrast = c("condition", "female", "male"))
results_DESeq2_female_vs_male_gene$gene_id <- rownames(results_DESeq2_female_vs_male_gene)
results_DESeq2_female_vs_male_gene <- results_DESeq2_female_vs_male_gene[, c("gene_id", colnames(results_DESeq2_female_vs_male_gene)[-ncol(results_DESeq2_female_vs_male_gene)])]
Thank you for your response! Does the fact that I set "female" as reference in the script change what you describe being inferred by contrast? I thought that a negative fold change would mean that "SRY gene is downregulated in males relative to females", but you mean that it's actually "SRY gene is downregulated in females relative to males"?
A contrast is an explicit definition of the comparison. The reference level has no influence here.
Thank you-much appreciated!