I am trying to study the presence of a mutation in a gene and how it affects the RNA expression of various diagnosis of cancer.
The sample info file looks like:
sample diagnosis geneX_mutation_status
1 cancer_subtype1 wild_type
2 cancer_subtype2 wild_type
3 cancer_subtype3 wildtype
4 cancer_subtype1 mutated
5 cancer_subtype 2 mutated
6. cancer_subtype4 wildtype
Few things of note: 1.Total cancer subtypes: 6(levels)
2.Each subtype considered has both mutated and wild type representation.
3.The number of wild type is more within each category than the mutated thus cancer_subtype1 can have 3 samples with wild type genotype and only 1 of that mutated.
4.Number of levels of mutation_status: 2 (wildtype and mutant)
My design is as follows:
#first with the term:
dds <- DESeqDataSetFromMatrix(countData = countfile,
colData = coldata,
design = ~geneX_mutation_status+diagnosis+geneX_mutation_status:diagnosis)
#removing the term
dds <- DESeq(dds, test="LRT", reduced =~geneX_mutation_status+diagnosis)
But when I try to check the results using resultnames. I can only see interaction with the wildtype genotype. The results are:
[1] "Intercept"
[2] "geneX_mutation_wildtpe_vs_mutant"
[3] "cancer_subtype2_vs_cancer_subtype1"
[4] "cancer_subtype3_vs_cancer_subtype1"
[5] "cancer_subtype4_vs_cancer_subtype1"
[6] "cancer_subtype5_vs_cancer_subtype1"
[7] "cancer_subtype6_vs_cancer_subtype1"
[8] "geneX_mutationwildtype.cancer_subtype2"
[9] "geneX_mutationwildtypecancer_subtype3"
[10] "geneX_mutationwildtype.cancer_subtype4"
[11] "geneX_mutationwildtype.cancer_subtype5"
[12] "geneX_mutationwildtype.cancer_subtype6"
So, first I am losing the geneX_mutationmutant category comparison. I am also seeing only a few comparisons between the disease categories in the diagnosis.
From what I understood the reduced model first creates the model with the factor you are trying to test the interaction and then without it. After which it calculates the likelihood model for checking the results being observed are indeed because of the factor we are testing.
Here I am trying to see the interaction between the mutation status on the diagnosis and hence it should consider the mutated samples as well which it isnt.
Is there something I am missing in the model, or the number of samples of mutated vs wild type in each diagnosis effecting this!
Kindly help!