Question

How to check whether experimental covariates are confounded with our group of interest?

1

Entering edit mode

english.server ▴ 20

@englishserver-15152

Last seen 2.9 years ago

Iran

Hi

In the context of microarray analysis, how to check whether experimental covariates (age/gender) are confounded with our grouping of interest (ie diseased vs normal)?

goi = sample(letters[1:2], 20, T) # group of interest

cov=list() #3 covariates
cov$c1=sample(letters[1:4], 20, T)
cov$c2=sample(letters[1:5], 20, T)
cov$c3=sample(letters[1:2], 20, T)

numeric_covar= sample (c(25:60), 20, T)

Approach1- chisq.test

sapply(names(cov), function (x) 
                     chisq.test (cov[[x]], goi) $p.value) # not for numeric_covar

Appraoch2- Anova/t.test

sapply(names(cov), function (x)  # suitable for *numeric_covar* as well.
                     anova(lm( as.numeric(as.factor(cov[[x]])) ~ as.numeric(as.factor(goi))))$'Pr(>F)'[1])

I think the numeric_covar can only be dealt with the second approach.

covariates microarray • 1.3k views

ADD COMMENT • link 5.5 years ago english.server ▴ 20

score 2 · Answer 1 · 2019-09-29

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

Well, I tried to help you previously:

https://support.bioconductor.org/p/124670/

and I think I might even have been the one who introduced you to the term "confounded" for experimental designs. I advised you against trying to use tests like the ones you give here. The tests you propose are simply testing for correlation rather than confounding and neither seems informative to me.

I wonder what real problem you are trying to solve. I wonder whether you are perhaps trying to come up with a series of tests so that you can analyse microarray datasets automatically without having to look at plots or think about the variables. That would be an unrealistic idea. If you are trying to decide which variables to include in a microarray analysis, it is better to look at your data and think about the meaning of the variables, perhaps making a table or a plot along the way to help you.

There are mathematical methods for examining collinearity in linear models using an eigenvalue decomposition of the design matrix, but this is strictly for mathematicians and I do not think it would be helpful anyway for any real microarray dataset.

BTW, the term "covariate" always refers to a numeric variable. Categorical variables are instead called factors.

ADD COMMENT • link 5.5 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you Gordon Smyth. Regarding my previous post, I thought the concepts asked there are different from what is asked here! Thanks for clarification. You are absolutely right about me wanting to "analyse microarray datasets automatically without having to look at plots or think about the variables." and that's partly because I think my knowledge in statistics is way too shallow and I'm trying to simplify things, ie to look for a number (threshold) to decide about samples. I was happy to find a tutorial on github using somehing like what I wote above:

sapply(names(cov), function (x)  # suitable for *numeric_covar* as well.
                     anova(lm( as.numeric(as.factor(cov[[x]])) ~ as.numeric(as.factor(goi))))$'Pr(>F)'[1])

but it seems that I have been overgeneralizing. The link to the github tutorial: https://github.com/icnn/Microarray-Tutorials/wiki/Affymetrix#7

ADD REPLY • link 5.5 years ago english.server ▴ 20

0

Entering edit mode

If goi has two groups, then the code you've written is a very complicated way of doing a two-sample t-test. If goi has more than two groups, then the code will give nonsense results.

ADD REPLY • link 5.5 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you for your response. I think I've to study a little more to digest the concept.

ADD REPLY • link 5.5 years ago english.server ▴ 20