Hi
In the context of microarray analysis, how to check whether experimental covariates (age/gender) are confounded with our grouping of interest (ie diseased vs normal)?
goi = sample(letters[1:2], 20, T) # group of interest
cov=list() #3 covariates
cov$c1=sample(letters[1:4], 20, T)
cov$c2=sample(letters[1:5], 20, T)
cov$c3=sample(letters[1:2], 20, T)
numeric_covar= sample (c(25:60), 20, T)
Approach1- chisq.test
sapply(names(cov), function (x)
chisq.test (cov[[x]], goi) $p.value) # not for numeric_covar
Appraoch2- Anova/t.test
sapply(names(cov), function (x) # suitable for *numeric_covar* as well.
anova(lm( as.numeric(as.factor(cov[[x]])) ~ as.numeric(as.factor(goi))))$'Pr(>F)'[1])
I think the numeric_covar can only be dealt with the second approach.
Thank you Gordon Smyth. Regarding my previous post, I thought the concepts asked there are different from what is asked here! Thanks for clarification. You are absolutely right about me wanting to "analyse microarray datasets automatically without having to look at plots or think about the variables." and that's partly because I think my knowledge in statistics is way too shallow and I'm trying to simplify things, ie to look for a number (threshold) to decide about samples. I was happy to find a tutorial on github using somehing like what I wote above:
but it seems that I have been overgeneralizing. The link to the github tutorial: https://github.com/icnn/Microarray-Tutorials/wiki/Affymetrix#7
If
goi
has two groups, then the code you've written is a very complicated way of doing a two-sample t-test. Ifgoi
has more than two groups, then the code will give nonsense results.Thank you for your response. I think I've to study a little more to digest the concept.