I have no code to post. This is question about the different the differing results I get when I change a comparison variable in the design formula from character, to factor.
i am comparing differential expression across age groups.
The data has a variable 'age' : with these values, 20,25,30,35,40,45,50. with 20 as the base comparison level.
when I run results for 'age' as a factor I get :
Gene baseMean log2FoldChange lfcSE stat pvalue padj
GeneZ 2.0324404 -0.0230828518 0.17758857 -0.12997938 0.8965827428 0.96129754
but when I run it with 'age' as a character get :
Gene baseMean log2FoldChange lfcSE stat pvalue padj
GeneZ 2.0324404 -0.013965354 0.17827642 -0.07833539 9.375613e-01 9.842875e-01
Is R treating the factor data as numerical ordinal?
So, which should I use?
(single gene for example - I note the padj)
Many thanks.
Put a different way, it's likely that you are computing a different contrast somehow rather than having something to do with how R handles numeric-looking characters.
Only thing I could imagine is that when using a factor 20 is not the base level while when using character the internal conversion makes 20 the base level. There are posts here that show factor level order can make a slight difference in how DESeq2 estimates model parameters.
That's possible as well, although OP says 20 is the baseline.
That would seem a logical and reasonable explanation of what I'm seeing.