edgeR - coefficients in 3-factor experiment, complex contrasts and decideTestsDGE
0
0
Entering edit mode
@emanuel-heitlinger-4912
Last seen 10.3 years ago
Hi all, I have some questions regarding multi-factor-glms in edgeR. I am working on a RNA-seq experiment: I have 24 samples from 3 "treatments" each having two levels. This means 3 biological replicates per treatment combination. I want to model the full set of possible interactions (sex.conds*eel.conds*pop.conds), as expecially two-fold interactions could be very interetsing for my research-question. I want to categorize genes (contigs form a 454-transcriptome assembly, genome is unsequenced) I mapped my reads against into categoris: a) only different for sex b) only different for eel c) only different for pop d1)d2)d3) different for sex:eel, sex:pop, eel:pop In glmLRT giving simple coefficients would compare the complete model to a model removing one coefficient at a time. From application of glms in ecology I remember that an interaction effect should not be left in the model if the main effect is removed. Does this apply here? Should I compare the full model against e.g. the model minus pop, sex:pop, eel:pop and sex:eel:pop, when I want to remove condition "pop" from the model? Hope this code demonstrates what I mean: ## CODE start library(edgeR) ## generate a df of neg. bionm. counts y <- as.data.frame(matrix(rnbinom(6000,mu=10,size=10),ncol=24)) names(y) <- c("AA_R11M", "AA_R16M", "AA_R18F", "AA_R28F", "AA_R2M", "AA_R8F", "AA_T12F", "AA_T20F", "AA_T24M", "AA_T3M", "AA_T42M", "AA_T45F", "AJ_R1F", "AJ_R1M", "AJ_R3F", "AJ_R3M", "AJ_R5F", "AJ_R5M", "AJ_T19M", "AJ_T20M", "AJ_T25M", "AJ_T26F", "AJ_T5F", "AJ_T8F") sex.conds <- factor(ifelse(grepl("M$", names(y)), "male", "female" )) eel.conds <- factor(ifelse(grepl("^AA", names(y)), "Aa", "Aj" )) pop.conds <- factor(ifelse(grepl("\\w\\w_R.*", names(y)), "EU", "TW" )) design <- model.matrix(~sex.conds*eel.conds*pop.conds) ## Counts frame to full DGEList d <- DGEList(y, lib.size=colSums(y)) d <- calcNormFactors(d) d <- estimateGLMCommonDisp(d, design=design) d <- estimateGLMTrendedDisp(d, design=design) d <- estimateGLMTagwiseDisp(d, design=design) glmfit <- glmFit(d, design, dispersion=d$tagwise.dispersion) glm.wrapper <- function(de.obj, fit.obj, conds.regex){ glm.list <- list() coe <- names(as.data.frame(fit.obj$design)) coe.l <- lapply(conds.regex, function (x) grep(x, coe)) for (i in 1:length(conds.regex)){ glm.list[[conds.regex[[i]]]] <- glmLRT(de.obj, fit.obj, coef=coe.l[[i]]) } return(glm.list) } ## selects all coefficients being contained in each other hierachically combi.conds <- gsub(":", ".*", names(as.data.frame(glmfit$design))[2:8]) glm.l <- glm.wrapper(d, glmfit, combi.conds) ## show what is compared lapply(glm.l, function (x) x$comparison) ## topTags works (same as using p.adjust directly) topTags.l <- lapply(glm.l, function (x){ tt <- topTags(x, n=40000) ## set as high as the length tt[tt$table$adj.P.Val<0.05] ## get only below adj.P }) ## Code End Then I would look through the topTags list to categorize the contigs as stated above. E.g. from topTags.l[[1]] I would take only those not in topTags.l[[c(4, 5, 7]] to get category a) stated above, from topTags.l[[4]] only those not in topTagl.l[[7]] to get d1. This seems all a bit complicated to me, is this a correct way of doing this? I am alos a bit worried that decideTestsDGE seems to not work on DGELRT-objects with complicated coefficients. Eg. ## Code Start ## decideTestsDGE does not work decideTestsDGE.l <- lapply(glm.l, function (x){ subset(x$table, (decideTestsDGE(x, p.value = .05))!=0)}) ## Code End I saw that for simple coefficients the results of decideTestsDGE differ from topTags. Is this expected, what is the difference between the two, thought both do p-value adjustment the same way (I could provide code if these differenced would not be the expected behaviour)? These are my questions for now. I would be very greatful for help! All the Best, Emanuel sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.2.6 loaded via a namespace (and not attached): [1] limma_3.8.3 tools_2.13.0 -- Emanuel Heitlinger Karlsruhe Institute of Technology (KIT) Zoological Institute; Ecology/Parasitology Kornblumenstr. 13 76131 Karlsruhe Germany Telephone +49 (0)721-608 47654 or University of Edinburgh Institute of Evolutionary Biology Kings Buildings, Ashworth Laboratories, West Mains Road Edinburgh EH9 3JT Scotland, UK Telephone:+44 (0)131-650 7403
Category edgeR Category edgeR • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6