Hi,
I have sample table that looks like below:
>head(sampletable) SampleId SampleName Location Tissue 1 FR10_MCRP_374D_S108_L005 FR10MC FR MC 2 FR10_MCRP_374D_S56_L001 FR10MC FR MC 3 FR10_MCRP_374D_S56_L002 FR10MC FR MC 4 FR10_MCRP_374D_S56_L003 FR10MC FR MC 5 FR10_MCRP_374D_S56_L004 FR10MC FR MC 6 FR10_MCRP_374D_S56_L005 FR10MC FR MC
The data comes from two locations (FR and YR) and four tissues (MC, PR, SC and TH).
>sampletable$Location [1] FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR [59] FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR FR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR [117] YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR YR Levels: FR YR
> sampletable$Tissue [1] MC MC MC MC MC MC MC PR PR PR PR PR PR PR SC SC SC SC SC SC SC TH TH TH TH TH TH TH MC MC MC MC MC MC MC PR PR PR PR PR PR PR SC SC [45] SC SC SC SC SC TH TH TH TH TH TH TH MC MC MC MC MC MC MC PR PR PR PR PR PR PR SC TH TH TH TH TH TH TH MC MC MC MC MC MC MC PR PR PR [89] PR PR PR PR SC TH TH TH TH TH TH TH MC MC MC MC MC MC MC PR PR PR PR PR PR PR SC SC SC SC SC SC SC TH TH TH TH TH TH TH MC MC MC MC [133] MC MC MC PR PR PR PR PR PR PR SC SC SC SC SC SC SC TH TH TH TH TH TH TH Levels: MC PR SC TH
I want to make differential expression between two locations for each tissues so I combined Location and Tissue to form group and ran deseq2.
> dds$group [1] FRMC FRMC FRMC FRMC FRMC FRMC FRMC FRPR FRPR FRPR FRPR FRPR FRPR FRPR FRSC FRSC FRSC FRSC FRSC FRSC FRSC FRTH FRTH FRTH FRTH FRTH FRTH FRTH FRMC [30] FRMC FRMC FRMC FRMC FRMC FRMC FRPR FRPR FRPR FRPR FRPR FRPR FRPR FRSC FRSC FRSC FRSC FRSC FRSC FRSC FRTH FRTH FRTH FRTH FRTH FRTH FRTH FRMC FRMC [59] FRMC FRMC FRMC FRMC FRMC FRPR FRPR FRPR FRPR FRPR FRPR FRPR FRSC FRTH FRTH FRTH FRTH FRTH FRTH FRTH YRMC YRMC YRMC YRMC YRMC YRMC YRMC YRPR YRPR [88] YRPR YRPR YRPR YRPR YRPR YRSC YRTH YRTH YRTH YRTH YRTH YRTH YRTH YRMC YRMC YRMC YRMC YRMC YRMC YRMC YRPR YRPR YRPR YRPR YRPR YRPR YRPR YRSC YRSC [117] YRSC YRSC YRSC YRSC YRSC YRTH YRTH YRTH YRTH YRTH YRTH YRTH YRMC YRMC YRMC YRMC YRMC YRMC YRMC YRPR YRPR YRPR YRPR YRPR YRPR YRPR YRSC YRSC YRSC [146] YRSC YRSC YRSC YRSC YRTH YRTH YRTH YRTH YRTH YRTH YRTH Levels: FRMC FRPR FRSC FRTH YRMC YRPR YRSC YRTH
>resultsNames(dds) [1] "Intercept" "group_FRPR_vs_FRMC." "group_FRSC_vs_FRMC." "group_FRTH_vs_FRMC." "group_YRMC._vs_FRMC." "group_YRPR_vs_FRMC." "group_YRSC_vs_FRMC." [8] "group_YRTH_vs_FRMC."
Do you know why there is "." with FRMC and why this design does not work?
I get the following error:
> MC_YR-FR<-results(dds, contrast = c("group", "YRMC", "FRMC")) Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, : YRMC and FRMC should be levels of group such that group_YRMC_vs_FRMC. and group_FRMC_vs_FRMC. are contained in 'resultsNames(object)'
Thanks.
Thanks Peter!!
Glad to be of help. Although this is not your (or DESeq2's) fault at all, it may be helpful to add some sort of a check inside your pipeline that factor levels are valid names. This type of a problem (factor levels that are not valid names) is probably quite common but it can be difficult for non-R-programmers to figure out what's going on and how to fix it. Been there, done that...
Currently the only check is that, after conversion by make.names(), which is required for other R code to run, that the levels are still unique.
But I could just as well put a check that levels are the same before and after make.names(), so people don't get confused with how things are renamed by R. Can't think of a reason why not to add that check. Thanks for the suggestion.
Fixed. This could potentially throw some errors for existing code out there (it did in vignette, where i had to change single-read and paired-end to single and paired), but I'd prefer to help users avoid confusion like above.
Numbers, while not being valid R names, are perfectly safe as DESeq2 factor levels, so I instead insist that the levels are just alphanumeric plus '_' or '.'
New behavior is:
> counts <- matrix(1:16,ncol=4) > coldata <- data.frame(x=factor(c("A","A ","B","B"))) !> dds <- DESeqDataSetFromMatrix(counts, coldata, ~x) Error in DESeqDataSet(se, design = design, ignoreRank) : levels of factors in the design contain characters other than letters, numbers, '_' and '.'
I switched this to a message rather than a warning, because it's not necessary to break code in order to give users the message that they have e.g. spaces in their factor levels
https://github.com/Bioconductor-mirror/DESeq2/commit/b76d4f83fe835ad91d9fad87478ca0f0bf3a4212