edgeR analysis of cells treated with either drug1, drug2, or a time matched control at two different time points, did I set this up correctly?
1
0
Entering edit mode
@brettvanderwerff-17189
Last seen 6.4 years ago

Hi, 

I am mostly just looking for some feedback on whether my workflow in edgeR is correct.

 

I have some bulk mRNA seq samples from cultured cells treated with either drug1, drug2, or a vehicle matched control for either 1 or 4 hr time periods (N=3/group, 18 samples total, see targets below). 

At first, I want to compare the genes that respond to drug1 over control and the genes that respond to drug2 over control within each timepoint.

In other words make the contrasts of control 1h vs drug1 1h and see what genes in that contrast overlap/diverge with what is seen in the control 1h vs drug2 1h contrast.

I have read through a lot of the edgeR manual and have been using section 3.3.1 as an example because it seems close to what I am doing.  

 

Here is where I am at (I am new to R sorry). Does this look like a reasonable approach? Thanks for reading and please let me know if there are flaws.:

 

>targets
   Treat Time
Sample1  control   1h
Sample2  control   1h
Sample3  control   1h
Sample4    drug1   1h
Sample5    drug1   1h
Sample6    drug1   1h
Sample7    drug2   1h
Sample8    drug2   1h
Sample9    drug2   1h
Sample10 control   4h
Sample11 control   4h
Sample12 control   4h
Sample13   drug1   4h
Sample14   drug1   4h
Sample15   drug1   4h
Sample16   drug2   4h
Sample17   drug2   4h
Sample18   drug2   4h

#Combine treatment and time into a single factor
group <- factor(paste(targets$Treat, targets$Time, sep="."))
cbind(targets, group=group)

#Read in the gene count matrix
counts <- read.delim("my_gene_count_matrix.csv")

#Make the DGEList object
y <- DGEList(counts=counts, group=group)

#Filter out lowly expressed genes
keep <- rowSums(cpm(y)>1) >=2
y <- y[keep, , keep.lib.sizes=FALSE]

#Normalize libraries
y <- calcNormFactors(y)

#Set up design matrix as "one-way" layout
design <- model.matrix(~0+group)
colnames(design) <- levels(group)

>design
   control.1h control.4h drug1.1h drug1.4h drug2.1h drug2.4h
1           1          0        0        0        0        0
2           1          0        0        0        0        0
3           1          0        0        0        0        0
4           0          0        1        0        0        0
5           0          0        1        0        0        0
6           0          0        1        0        0        0
7           0          0        0        0        1        0
8           0          0        0        0        1        0
9           0          0        0        0        1        0
10          0          1        0        0        0        0
11          0          1        0        0        0        0
12          0          1        0        0        0        0
13          0          0        0        1        0        0
14          0          0        0        1        0        0
15          0          0        0        1        0        0
16          0          0        0        0        0        1
17          0          0        0        0        0        1
18          0          0        0        0        0        1
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"

#Estimate dispersions/fit 
y <- estimateDisp(y, design)
fit <- glmQLFit(y, design)                  

#The way I understand 3.3.1 in the edgeR manual, I am free to make any comparison now, but some simple ones would be:
my.contrasts <- makeContrasts(
  cont_1hr_vs_drug1_1hr = drug1.1h-control.1h,
  cont_1hr_vs_drug2_1hr = drug2.1h-control.1h,
  cont_4hr_vs_drug1_4hr = drug1.4h-control.4h,
  cont_4hr_vs_drug2_4hr = drug2.4h-control.4h,
  levels = design
)

# Lets say I just wanted to find genes that are different between drug1 and control at the 1hr timepoint I would go:
res1 <- glmQLFTest(fit, contrast=my.contrasts[,"cont_1hr_vs_drug1_1hr"])

topTags(res1)

# Lets say I just wanted to find genes that are different between drug2 and control at the 1hr timepoint I would go:

res2 <- glmQLFTest(fit, contrast=my.contrasts[,"cont_1hr_vs_drug2_1hr"])

topTags(res2)

#Then, if I wanted to compare and contrast how drug1 and drug2 induce responses differently vs contorl at 1hr I could make a venn diagram based off the significant genes in res1 and res2?.

 

edger rnaseq gene expression bioinformatics • 1.2k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Looks fine to me. Although I would normally just do

keep <- filterByExpr(y, group = group)

And the last comparison you are interested in is essentially an interaction term, where you are interested in changes between drug1 vs control and drug2 vs control at 1 hour. But this reduces to just the difference between drug2 and drug1 at 1 hour:

(drug1.1hr - control.1hr) - (drug2.1hr - control.1hr) = drug1.1hr - drug2.1hr

So if you want to know differences between drug 1 and 2 at either time point, you just compare them directly.

 

 

ADD COMMENT
0
Entering edit mode

Thank you so much for taking the time to read my question and respond. The documentation and community surrounding edgeR is amazing. 

I am not too sure about that last bit you put. The wording of my question could have been better too, but I am trying to compare and contrast how drug1 and drug2 each differ from control. They are very related drugs that probably have some similarities and some difference and I would like to explore that. 

The way I think I understand it the comparison:

drug1.1hr - drug2.1hr

would tell me what is significantly different between these two drugs, which may be a bit different than what I am looking for. 

 

I hope to end up with 3 lists of genes from each time point that shows the following (1hr timepoint in this example):

1. Genes that respond significantly only to drug1 when compared against control at 1hr (but not drug2 when compared against control at 1hr) 

2. Genes that respond significantly only to drug2 when compared against control at 1hr (but not drug1 when compared against control at 1hr) 

3. Genes that respond significantly to both drug1 and drug2 when compared against control  at 1hr

The only way I can think to do this would be to individually contrast

drug1.1hr - control.1hr

and then individually contrast:

drug2.1hr - control.1hr

Then look at the overlap/divergence between the resulting sig gene lists. Thats what I was getting at with my venn diagram approach. Am I way off the mark? Please let me know if I have misunderstood something or if there is a better way to do this. I really appreciated the initial response. Thank you.  

ADD REPLY
1
Entering edit mode

Yeah, you could do a Venn diagram as well. See ?decideTests.DGELRT and ?vennDiagram, particularly the 'include' argument. If you use 'both', then the intersection may include genes that went up for one drug and down for the other, which may not be what you want.

ADD REPLY
0
Entering edit mode

Great thank you, I have looked at decideTests previously and vennDiagram looks like it will do a process that I otherwise would have had to do manually. Thank you very much for reading and responding. 

ADD REPLY

Login before adding your answer.

Traffic: 345 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6