Question

limma and Syntactically Invalid Factor Values

0

Entering edit mode

Dario Strbenac ★ 1.5k

@dario-strbenac-5916

Last seen 3 days ago

Australia

Some functions allow them (e.g. lmFit) but others do not (e.g. makeContrasts). May it be consistently permitted or banned?

Error in makeContrasts(paste(aType, "-", paste("(", paste(setdiff(allTypes, aType), collapse = '+':
  Non-valid names: sampleIDOSCC_12-P,sampleIDOSCC_12-R1,sampleIDOSCC_16-P,sampleIDOSCC_16-M etc.

limma • 782 views

ADD COMMENT • link updated 6 months ago by Gordon Smyth 52k • written 6 months ago by Dario Strbenac ★ 1.5k

0

Entering edit mode

This doesn't seem a user question but a question for the development of the package... maybe use make.names to make syntactically R valid names?

make.names(c("sampleIDOSCC_12-P","sampleIDOSCC_12-R1","sampleIDOSCC_16-P","sampleIDOSCC_16-M"))
## [1] "sampleIDOSCC_12.P"  "sampleIDOSCC_12.R1" "sampleIDOSCC_16.P" 
## [4] "sampleIDOSCC_16.M"

Or if this is to to name the reference in the contrast I tend to use "X_vs_Y" which is valid and readable (but are longer).

ADD REPLY • link 6 months ago Lluís Revilla Sancho ▴ 760

score 0 · Answer 1 · 2024-07-22

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

Will you give an example of using a syntactically invalid name in lmFit? Or any other function in limma?

ADD COMMENT • link 6 months ago James W. MacDonald 68k

0

Entering edit mode

OP's question (which is very brief and omits any background) is somewhat misleading because the issue that it is concerned with actually exists with or without factors. What he means is simply that limma does not impose restrictions on coefficient names, and allows coefficient names that would cause an error if input to makeContrasts(). For example, the following would be a perfectly acceptable design matrix for limma:

> design <- cbind(Intercept=1,"A-B"=c(0,0,1,1))
> design
     Intercept A-B
[1,]         1   0
[2,]         1   0
[3,]         1   1
[4,]         1   1

even though "A-B" would not be acceptable as a variable name in R and will case an error if input to makeContrasts(). After running lmFit, the column names of design would then become the column names of the fitted model object. limma objects are analogous to matrices in R, and limma allows exactly the same range of column and row names as would be acceptable for any matrix in base R. I have no plans to change that.

ADD REPLY • link 6 months ago Gordon Smyth 52k

0

Entering edit mode

The restriction is implemented inside limma in modelmatrix.R file in makeContrasts function.

#   Construct matrix of custom contrasts
#   Gordon Smyth
#   30 June 2003.  Last modified 2 April 2010.
        ...              ...
notvalid <- (levels != make.names(levels))
if(any(notvalid)) stop("The levels must by syntactically valid names in R, see help(make.names).
                        Non-valid names: ", paste(levels[notvalid], collapse = ","))

The factor levels of a factor become coefficient names, so it is about values in the targets frame that trigger an error in makeContrasts rather than error early in lmFit. Section 4.3: The Targets Frame of vignette does not mention this case. Consider a factor column in a targets frame named sortedCellType with values c("epithelial", "CD4+ T-cell", "B-cell").

ADD REPLY • link 6 months ago Dario Strbenac ★ 1.5k

1

Entering edit mode

It is (by definition) impossible to evaluate a mathematical expression in R with syntactically invalid variable names. That is actually what "syntactically valid" means, a variable name that can be used in a mathematical expression. Since the purpose of the makeContrasts function is to evaluate contrasts from mathematical expressions, it is essential to work with syntactically valid names.

The code you mention does not impose any restriction other than what R imposes itself. The limma code simply checks for the problem and gives a user-friendly error message instead of allowing R itself it give you a more cryptic error message later on.

There are lots of work-arounds. You can easily use all the functionality of limma without running makeContrasts. You can easily run makeContrasts with syntactically valid names if you choose -- there is no requirement to use the coefficient names without modification. Or you can easily choose syntactically valid coefficient names in the first place if you so choose. The function make.names() is provided for that purpose.

ADD REPLY • link 6 months ago Gordon Smyth 52k

0

Entering edit mode

The scenario is writing a wrapper around lmFit and being robust to unpredictable user input.

library(limma)
RNA <- matrix(rnorm(40), ncol = 4)
rownames(RNA) <- paste0("gene", 1:nrow(RNA))
clinical <- data.frame(sample = rep(LETTERS[1:2], each = 2),
                       types = rep(c("epithelial", "CD4+ T-cell"), 2))
design <- model.matrix(~ 0 + types + sample, clinical)
fit <- lmFit(RNA, design = design) 
colnames(fit$coefficients) # Syntactically invalid names generated.
makeContrasts("typesCD4+ T-cell - typesepithelial", levels = design) # Error about coefficient names.

make.names for all columns and then reverting to original values after topTable seems like the way to do it.

In tidyverse, it is like:

library(ggplot2)
colnames(mtcars)[1] <- "Miles per-Gallon"
xVariable <- sym("Miles per-Gallon")
ggplot(mtcars, aes(x = !!xVariable, y = hp)) + geom_point() # No error regarding spaces, hyphens etc.

ADD REPLY • link 6 months ago Dario Strbenac ★ 1.5k