how to design contrast formula with 2 factors multiple levels in edgeR?
3
0
Entering edit mode
feicaiyi • 0
@feicaiyi-7109
Last seen 10.1 years ago
China

hi ! I have a two factor RNA seq ,each factors have multiple levels(4 developmental stages and 16 different tissues), here is a part of my data,

gene  P00_brain  P00_heart  P00_kidney  P00_liver  P00_lung  P21_brain  P21_heart  X7W_brain  X7W_heart
gene1  7211  7551  6839  1891  13004 5699 1250 7004 9224
gene2  15   12 11 2 8 6 3 22 24
gene3  15  12 11 2 8 6 3 22 24
gene4  7215   7546 6838 1891 12990 5691 1251 6996 9188
gene5  0  0 0 0 0 0 0 0 0
gene6  7437  2412 2490 159 4669 7309 830 4125 3376
gene7  0  0 1 0 0 1 0 1 1
gene8  33512  7114 5483 527 10461 12912 2113 9720 9734
gene9  7  2 1 0 1 1 0 0 2

There are 10 of these samples having replicates.(total 58 combinations,not 64 combinations since there are some tissues are not developed in certain developmental stage) 
Here are my assumptions: 

1. There are common DE geneto guide development regardless tissues in order to response same developmental signals, 
2. Different tissues have different DE genes to guide differentiations even if these different tissues are at the same developmental stage.

I would like to ask How to design a formalu(   moderl.matrix(~time+tissue?) ) to answer my previous questions?:
1. To find differential expressed genes regardless tissues among four periods?? 
In my thoughts, I should perform DE analysis in each single tissue among 4 periods, afterwards I just pick the co-occurrence genes in top (maybe 50?) DEed genes ? what would be the difference between code below:
d1 <- model.matrix(~time+tissue)
fit <- glmFit(y, d1)
lrt <- glmLRT(fit, coef=2:4)


I am new to edgeR and biostatics, thanks in advance.

R edgeR DE • 3.4k views
ADD COMMENT
0
Entering edit mode

It's not entirely clear what your experimental design is. You say have 58 different combinations of time and tissue. But how many replicates do you have for each combination? There's a mention of "10 samples having replicates", but I'm not sure how to interpet that. You'd probably get a better answer if you showed us the definition of the time and tissue vectors.

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

The simplest approach would be to define your design matrix with a one-way layout, as below:

grouping <- factor(paste0(time, ".", tissue))
design <- model.matrix(~0 + grouping)
colnames(design) <- levels(grouping)

This is more appropriate than the additive model (i.e., model.matrix(~time+tissue)) if you're interested finding tissue-specific DE genes between timepoints. More specifically, if you use your additive model, you'll be making some assumptions about the consistency of the time effect across different tissues. This would defeat the purpose of looking for time-induced changes that are different between tissues. Now, if y is your DGEList object, you can then run:

y <- estimateDisp(y, design)
fit <- glmFit(y, design)

The contrasts will depend on what you're interested in. If you want to find DE genes that change between time points in a similar manner across tissues, you can do something like:

con.brain <- makeContrasts(E13.brain - P00.brain, levels=design)
lrt.brain <- glmLRT(fit, contrast=con.brain)
con.heart <- makeContrasts(E13.heart - P00.heart, levels=design)
lrt.heart <- glmLRT(fit, contrast=con.heart)

Then, you intersect the set of DE genes in lrt.heart and lrt.brain to get common changes in both tissue types (note that you'll need to manually check that the sign of the log-fold change is the same in both comparisons). Obviously, you can include more contrasts in the intersection, e.g., for kidney or lung, to get common changes across more tissue types. However, be aware that this'll be more conservative, so you'll get fewer DE genes in the final set.

Your other aim refers to tissue-specific DE genes across developmental stages. This can be accommodated by running:

con.brain2heart <- makeContrasts(E13.brain - P00.brain - (E13.heart - P00.heart), levels=design)
lrt.heart <- glmLRT(fit, contrast=con.brain2heart)

This'll find the set of genes where the DE between timepoints in brain is different to that in the heart. If you want to include more tissues, then you can do an ANOVA-style analysis:

con.tissue <- makeContrasts(E13.brain - P00.brain - (E13.heart - P00.heart),
    E13.brain - P00.brain - (E13.lung - P00.lung),
    E13.brain - P00.brain - (E13.kidney - P00.kidney),
    E13.brain - P00.brain - (E13.liver - P00.liver),
    levels=design)
lrt.tissue <- glmLRT(fit, contrast=con.tissue)

This'll find genes that respond differently between these two timepoints in any tissue.

ADD COMMENT
0
Entering edit mode
feicaiyi • 0
@feicaiyi-7109
Last seen 10.1 years ago
China

Thanks, Aaron. Here is part of my definition of "time" and "tissue" vectors, is it like this? In each "time", There are at least two replicates of different time~tissue combinations.

Sample Time Tissue
sample1 P00 heart
sample2 P00 heart
sample3 P00 brain
sample4 P00 liver
sample5 P00 kidney
sample6 P00 lung
sample7 P21 heart
sample8 P21 brain
sample9 P21 kidney
sample10 P21 liver
sample11 P21 lung
sample12 P21 lung
sample13 X7W heart
sample14 X7W brain
sample15 X7W lung
sample16 X7W kidney
sample17 X7W liver
sample18 X7W liver
sample19 E13 heart
sample20 E13 brain
sample21 E13 kidney
sample22 E13 kidney
sample23 E13 lung
sample24 E13 liver
ADD COMMENT
0
Entering edit mode
feicaiyi • 0
@feicaiyi-7109
Last seen 10.1 years ago
China

Thanks, Aaron! 

"This is more appropriate than the additive model (i.e., model.matrix(~time+tissue)) if you're interested finding tissue-specific DE genes between timepoints. "

In my opinion, "between timepoints" here means a comparison between only two time points, 

eg. con.brain <- makeContrasts(E13.brain - P00.brain, levels=design)

so, how can I find tissue-specific DE genes among 4 time points? Will this kind of genes be important since they always respond to devlopmental signal ?

ADD COMMENT
0
Entering edit mode

I'll assume that you want to find tissue-specific DE genes across any of the 4 time points. If so, you can do something like:

con.brain2heart.all <- makeContrasts(E13.brain - P00.brain - (E13.heart - P00.heart),
    E13.brain - P21.brain - (E13.heart - P21.heart),
    E13.brain - X7W.brain - (E13.heart - X7W.heart),
    levels=design)

This will find genes where the response to any time effect is different between heart and brain. You can expand it to include other tissues if you wish, e.g., brain versus kidney, brain versus lung, brain versus liver. Obviously, you don't need to list every pairwise comparison. If brain is equal to lung and brain is equal to kidney under the null hypothesis, then lung is also equal to kidney.

P.S. You should post your reply as a comment to an answer, rather than as a separate answer. This'll ensure that your post shows up after the answer you're responding to. Otherwise, posts will be shuffled around according to the voting system.

ADD REPLY
0
Entering edit mode

Thanks, Aaron!

I think my thought is a little different with yours. I'm not sure whether I am correct.

Now I am using 

con.brain.all <- makeContrasts(P00.brain - E13.brain, P21.brain - E13.brain, X7W.brain - E13.brain,

                              levels=design)

My purpose is to find some genes can differently express according to different developmental stages in single tissue,and I repeat this step in other 15 tissues separately. Afterward, I intersect these sums of top 100 of 16 DE gene groups, and now I found several genes actually have such expression pattern: In almost every kind of tissue, expression values are changed according to different developmental stages.

 

ADD REPLY
0
Entering edit mode

I don't see where your 16 tissues are coming from. I only see heart, lung, liver, kidney and brain.

Anyway, your con.brain.all is an ANOVA-like analysis that will detect genes that are different between any time points for the brain. For the purpose of finding genes that might be involved in brain development, this is fine. However, it's not clear what you expect to get from intersecting the sets of DE genes between tissues. It won't tell you if the genes have tissue-specific responses, because a gene with the same pattern of change across timepoints in all tissues would still show up in the intersected set.

Similarly, the intersection won't tell you if the genes have common responses to time across tissues. For example, a gene that is upregulated across time in one tissue might be downregulated across time in another tissue. This gene would show up in the intersected set so long as the change was significant in each tissue - but I would hardly call this a gene that has common behaviour in those two tissues. To account for this, you need to consider the signs of the log-fold changes in the ANOVA.

In short, the intersection procedure will only give you the genes that have some kind of significant change across time in every tissue. Whether that change is the same or different between tissues is not specified.

ADD REPLY
0
Entering edit mode

The data I provided here is only a fraction of my entire data in order to illustrate my data structure, in my original question I mentioned there are 4 times and 16 tissues.

I think the point of second and third paragraphs of your reply is right, I have to check my result whether these genes have similar variation pattern or different patterns.

The reason why I am doing this is because I want to verify whether there are some genes which are employed to control development regardless tissue differences. The assumptions are simple: some genes take charge for the time when to grow regardless tissues, and other genes have the capabilities to control what kind of cells they should be.

ADD REPLY

Login before adding your answer.

Traffic: 250 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6