Question

RE: Design matrix with multiple genotypes + quantified variables (+cor/regression)

0

Entering edit mode

Matthew Hannah ▴ 940

@matthew-hannah-621

Last seen 10.6 years ago

Again, sorry for initially posting without to much investigation, but lots on (haven't we all) and I was hoping someones experience could save me alot of time. So heres an update. There are 2 basic questions - 1. Are the design and contrast matrices below correct? Is there a better way to design it. My hypothesis is that treatment N - treatment A will be similar between genotypes, but the genotypes will be different to each other. I'm looking for the global treatment contrast, but don't want the genotype differences getting in the way. Is this already taken care of in the design below or does the design need to be different. ie: is the lm contrast comparing (ConA, MutA, Mut2A) vs. (ConN, MutN, Mut2N) OR averaging(ConA-ConN, MutA-MutN, Mut2A-Mut2N). 2. How is it best to compare a variable to find genes that correlate to it. I've done a fair bit on this now but still need some pointers. The obvious thing to do was a genewise pearson, however, In 'Intro stats with R' there is the statement - "The reader should be warned that there are many incorrect uses of correlation coefficients, particularly when they are used in regression-type settings". Well I'm duly warned but not sure on what a regression-type setting is. Also it seems that regression and pearson give the same result. For the correlation I used cor, and then it suggests to test that the correlation is significantly different from zero using cor.test. From comparing these it seems that there is a strict relationship between the p-value and pearson coefficient that only varies with sample number (# of arrays). The p-value just gives an indication of what pearson is significant - but surely you don't need to get it for all genes as it just seems to rely on sample #? So I then proceded with regression analysis using lm(). The output values that appear to be useful are p-value and Rsquared. The former is the same as from cor.test, and the later is the squared pearson coefficient, which I've just discussed. Am I missing something, or is there a better way? Finally as Limma uses lm functions can I do the regression using it, to provide access to the other tools such as eBayes, classifyTests or toptable. Or are they fundamentally different? Thanks for your time, Matt -----Original Message----- From: Matthew Hannah Sent: Donnerstag, 19. August 2004 14:56 To: 'bioconductor@stat.math.ethz.ch' Subject: Design matrix with multiple genotypes + quantified variables Hi, After asking before this design and contrast matrix was suggested and it worked well. But now it gets complicated? 2 genotypes - Con, Mut 2 treatments - A, N. 4 replicates treatments <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)) design <- model.matrix(~ 0+treatments) colnames(design) <- c("ConA","ConN","MutA","MutN") fit <- lmFit(esetgcrma, design) cont.matrix <- makeContrasts(ConA-MutA, ConN-MutN, Gen=(ConN+ConA-MutN-MutA)/2, ConA-ConN, MutA-MutN, treatment=(ConA+MutA-ConN-MutN)/2,levels=design) con.fit <- contrasts.fit(fit, cont.matrix) So what if I add a third genotype - Mut2? Is it the obvious add treatments <- .....5,5,5,5,6,6,6,6)) and then for the contrasts treatment=(ConA+MutA+Mut2A-ConN-MutN-Mut2N)/3) Or am I misunderstanding how to design contrasts? Is there an easier way of writing this when you have more genotypes? Also logically the lm is treating all samples as independent when they are not, does this matter? Is it possible to fit the original lm using a design taking genotype and treatment into account? Would this be a better approach, especially as if you have more genotypes (eg:5-10). What would the design matrix then look like? Finally, what if you have a quantified variable for each genotype like a measure of growth before and after the treatment. Can you specify this in anyway (in the design matrix?) so you take this into account during the fit. I thought this was possible using lm or rlm, or am I confusing something? Alternatively, does anyone have a different approach, such as an efficient way of doing a gene-by-gene regression or correlation analysis against the growth measure, and extracting the genes that correlate best with the growth measure? Perhaps there is there a good (biologist simple?) book that would cover design and contrast of lms, anyone know of one? Thanks again, Matt

Regression limma Regression limma • 1.1k views

ADD COMMENT • link 20.7 years ago Matthew Hannah ▴ 940