Question

Comparison between three cell populations

0

Entering edit mode

roy.granit • 0

@roygranit-8480

Last seen 4.3 years ago

Israel

Hello,

After reviewing the edgeR guide a couple of times, I'm still not quite sure how to approach my analysis - would be glad to get some advice.

I'm comparing three cell population (each with two repeats), and wish to have a list of differentially expressed gene between each population and all the rest. What would be the correct measure to take?

1. Compare each two populations and cross the lists

2. Define the groups such that each time one sample is defined as a group while the two others are the

3. Run the glm this way : " lrt <- glmLRT(fit,coef=2:3)" - this way I get the logFC between each group, but not sure what is the next step

Any help would be appreciated, thanks,

edger • 873 views

ADD COMMENT • link 9.2 years ago roy.granit • 0

0

Entering edit mode

roy.granit • 0

@roygranit-8480

Last seen 4.3 years ago

Israel

Hi Aron,

The nice thing about biology is that there is never just a single correct answer.. :)

I do wish to find the DEG that differ between A and B+C, so I will be taking the first approach.

Thanks!

ADD COMMENT • link 9.2 years ago roy.granit • 0

score 3 · Accepted Answer · 2015-07-26

Let's assume we've got something like this:

populations <- factor(c("A", "A", "B", "B", "C", "C"))
design <- model.matrix(~ populations)
colnames(design) <- c("Intercept", "B", "C")

The intercept represents the log-expression level of group A, while coefficients B and C represent the log-fold change of groups B and C over A, respectively.

Now, imagine that we want to test for DE between A and the others. For the first proposal, I assume that when you say "crossing" the lists, you want to intersect the results of A versus B with the results of A versus C. This will give you the genes that are significantly DE in both comparisons, though it is a rather conservative operation (i.e., the true FDR is probably much lower than the nominal threshold at which you defined the DE genes in each comparison). Also, you don't control for the direction of DE, though this can be easily fixed by only intersecting genes that have the same sign of the log-fold change.

For the second proposal, you can do this using the same design matrix. Just define a contrast like so:

con <- makeContrasts((B+C)/2, levels=design)

This can be passed to glmLRT via the contrast= argument. It will then test for whether the expression in A is equal to the average expression across B and C. Thus, A can be compared to the other two groups. However, this will not work in all cases where there is DE between A and the other groups. For example, if B is upregulated relative to A, while C is downregulated relative to A, the average of B and C may be equal to A such that the contrast fails to reject the null hypothesis. The contrast may also reject where there is not DE between A and both other groups, e.g., if B is DE relative to A but C is not, the average of B and C will still be different from A. This may not be desirable if you only want genes that are DE between A and both groups.

For the final proposal, dropping the last two coefficients will test for significant differences between any of the groups. This means that the null may be rejected if there is DE between A and only one of the other groups.

So, in short, your choice depends on what you're willing to consider as DE between one population and the others. If you want to enforce DE between the chosen population and each of the others, then use approach 1. If you're willing to take DE between the chosen population and any of the other groups, then use approach 3. The second approach sits somewhere in the middle.