Hi,
I am mostly just looking for some feedback on whether my workflow in edgeR is correct.
I have some bulk mRNA seq samples from cultured cells treated with either drug1, drug2, or a vehicle matched control for either 1 or 4 hr time periods (N=3/group, 18 samples total, see targets below).
At first, I want to compare the genes that respond to drug1 over control and the genes that respond to drug2 over control within each timepoint.
In other words make the contrasts of control 1h vs drug1 1h and see what genes in that contrast overlap/diverge with what is seen in the control 1h vs drug2 1h contrast.
I have read through a lot of the edgeR manual and have been using section 3.3.1 as an example because it seems close to what I am doing.
Here is where I am at (I am new to R sorry). Does this look like a reasonable approach? Thanks for reading and please let me know if there are flaws.:
>targets Treat Time Sample1 control 1h Sample2 control 1h Sample3 control 1h Sample4 drug1 1h Sample5 drug1 1h Sample6 drug1 1h Sample7 drug2 1h Sample8 drug2 1h Sample9 drug2 1h Sample10 control 4h Sample11 control 4h Sample12 control 4h Sample13 drug1 4h Sample14 drug1 4h Sample15 drug1 4h Sample16 drug2 4h Sample17 drug2 4h Sample18 drug2 4h #Combine treatment and time into a single factor group <- factor(paste(targets$Treat, targets$Time, sep=".")) cbind(targets, group=group) #Read in the gene count matrix counts <- read.delim("my_gene_count_matrix.csv") #Make the DGEList object y <- DGEList(counts=counts, group=group) #Filter out lowly expressed genes keep <- rowSums(cpm(y)>1) >=2 y <- y[keep, , keep.lib.sizes=FALSE] #Normalize libraries y <- calcNormFactors(y) #Set up design matrix as "one-way" layout design <- model.matrix(~0+group) colnames(design) <- levels(group) >design control.1h control.4h drug1.1h drug1.4h drug2.1h drug2.4h 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 0 0 0 0 4 0 0 1 0 0 0 5 0 0 1 0 0 0 6 0 0 1 0 0 0 7 0 0 0 0 1 0 8 0 0 0 0 1 0 9 0 0 0 0 1 0 10 0 1 0 0 0 0 11 0 1 0 0 0 0 12 0 1 0 0 0 0 13 0 0 0 1 0 0 14 0 0 0 1 0 0 15 0 0 0 1 0 0 16 0 0 0 0 0 1 17 0 0 0 0 0 1 18 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" #Estimate dispersions/fit y <- estimateDisp(y, design) fit <- glmQLFit(y, design) #The way I understand 3.3.1 in the edgeR manual, I am free to make any comparison now, but some simple ones would be: my.contrasts <- makeContrasts( cont_1hr_vs_drug1_1hr = drug1.1h-control.1h, cont_1hr_vs_drug2_1hr = drug2.1h-control.1h, cont_4hr_vs_drug1_4hr = drug1.4h-control.4h, cont_4hr_vs_drug2_4hr = drug2.4h-control.4h, levels = design ) # Lets say I just wanted to find genes that are different between drug1 and control at the 1hr timepoint I would go: res1 <- glmQLFTest(fit, contrast=my.contrasts[,"cont_1hr_vs_drug1_1hr"]) topTags(res1) # Lets say I just wanted to find genes that are different between drug2 and control at the 1hr timepoint I would go: res2 <- glmQLFTest(fit, contrast=my.contrasts[,"cont_1hr_vs_drug2_1hr"]) topTags(res2) #Then, if I wanted to compare and contrast how drug1 and drug2 induce responses differently vs contorl at 1hr I could make a venn diagram based off the significant genes in res1 and res2?.
Thank you so much for taking the time to read my question and respond. The documentation and community surrounding edgeR is amazing.
I am not too sure about that last bit you put. The wording of my question could have been better too, but I am trying to compare and contrast how drug1 and drug2 each differ from control. They are very related drugs that probably have some similarities and some difference and I would like to explore that.
The way I think I understand it the comparison:
would tell me what is significantly different between these two drugs, which may be a bit different than what I am looking for.
I hope to end up with 3 lists of genes from each time point that shows the following (1hr timepoint in this example):
1. Genes that respond significantly only to drug1 when compared against control at 1hr (but not drug2 when compared against control at 1hr)
2. Genes that respond significantly only to drug2 when compared against control at 1hr (but not drug1 when compared against control at 1hr)
3. Genes that respond significantly to both drug1 and drug2 when compared against control at 1hr
The only way I can think to do this would be to individually contrast
and then individually contrast:
Then look at the overlap/divergence between the resulting sig gene lists. Thats what I was getting at with my venn diagram approach. Am I way off the mark? Please let me know if I have misunderstood something or if there is a better way to do this. I really appreciated the initial response. Thank you.
Yeah, you could do a Venn diagram as well. See ?decideTests.DGELRT and ?vennDiagram, particularly the 'include' argument. If you use 'both', then the intersection may include genes that went up for one drug and down for the other, which may not be what you want.
Great thank you, I have looked at decideTests previously and vennDiagram looks like it will do a process that I otherwise would have had to do manually. Thank you very much for reading and responding.