Hi,
I have a block-design for my study where the metadata looks like this:
sample_ID location sex polyp_type age status files 1 INTP_103 proximal 1 1 31 ctrl INTP_1031 2 INTP_103 distal 1 1 31 ctrl INTP_1032 3 INTP_103 rectum 1 1 31 ctrl INTP_1033 4 INTP_105 proximal 2 3 66 disease INTP_1051 5 INTP_105 distal 2 3 66 disease INTP_1052 6 INTP_105 rectum 2 3 66 disease INTP_1053 7 INTP_106 proximal 1 2 64 disease INTP_1061 8 INTP_106 distal 1 2 64 disease INTP_1062 9 INTP_106 rectum 1 2 64 disease INTP_1063 10 INTP_111 proximal 2 1 49 ctrl INTP_1111
For each sample there are 3 different locations and also there are two different status (ctrl, and disease). I would like to test the difference between locations, between status and interactions
Here is the code I am using
y <- DGEList(txi_longScaled$counts) # the counts come from a kallisto experiment y <- calcNormFactors(y) Treat <- factor(paste(metadata$status, metadata$location, sep=".")) design <- model.matrix(~0+Treat) colnames(design) <- levels(Treat) v <- voom(y, design) #Then we estimate the correlation between measurements made on the same subject corfit <- duplicateCorrelation(v, design, block=metadata$sample_ID) corfit$consensus v <- voom(y, design, block = metadata$sample_ID, correlation = corfit$consensus) fit <- lmFit(v, design, block = metadata$sample_ID, correlation = corfit$consensus) cm <- makeContrasts( CtrlvsCasesProximal = ctrl.proximal - disease.proximal, CtrlvsCasesDistal = ctrl.distal - disease.distal, CtrlvsCasesRectum = ctrl.rectum - disease.rectum, Ctrl.proximalvsCtrl.rectum = ctrl.proximal - ctrl.rectum, Ctrl.distalvsCtrl.proximal = ctrl.distal - ctrl.proximal, Ctrl.distalvsCtrl.rectum = ctrl.distal - ctrl.rectum, Dis.proximalvsDis.distal = disease.proximal - disease.distal, Dis.proximalvsDis.rectum = disease.proximal - disease.rectum, Dis.distalvsDis.rectum = disease.distal - disease.rectum, levels=design) fit2 <- contrasts.fit(fit, cm) fit2 <- eBayes(fit2) CtrlvsCasesProximal <- topTable(fit2, coef="CtrlvsCasesProximal"), number = nrow(y$counts) #will find those genes that are differentially expressed between the normal and diseased subjects in the location proximal
I would like to correct also for age and sex, my two questions are: a) is my approach correct (I am following this post using duplicateCorrelation with limma+voom for RNA-seq data)? and b) if I want to correct for sex and age is enough to do it when I first declare design? i.e
design <- model.matrix(~0+Treat+age+as.factor(metadata$status)
Thanks
Thanks a lot @Aaron, I have corrected the code and also look to the post you mentioned. I was a bit worried because the comparisons Ctrl.distalvsCtrl.proximal,Ctrl.distalvsCtrl.rectum, and Ctrl.proximalvsCtrl.rectum give me a lot of DE genes with really low adjusted pvalue (1x10-e32) and strong B (100.34) so or the differences are true (same patient different locations) or I am doing something wrong.