Question

post hoc analysis on nested model

0

Entering edit mode

ashley.lu • 0

@ashleylu-15735

Last seen 6.6 years ago

Dear all,

I would like to get some advice over how to do a post hoc test on a nested interaction model. This is related to another question I asked previously ( Nested design in edgeR ) .

The question is similar to the previous one (with higher number of levels), we would like to know how do different brain regions (13 regions) correlate to the level of a protein B measure by immuno-staining.

Here is the current approach i am taking:

## Nested model 

B_region_design=model.matrix(~0+region+B:region,meta)

## calculate commond, trend, tagwise dispersion for EdgeR model fitting

subcounts_disp=estimateDisp(subcounts,B_region_design,robust=T)

fit=glmQLFit(design = B_region_design,y = subcounts_disp,robust = T)

## interaction coefficient starts from column 14 to 26
res_QL=glmQLFTest(fit,coef =14:26)

tb=topTags(res_QL,n= Inf )$table


## create a contrast matrix with pairwise comparison between all regions
head(contrast)

COM:B-AUD:B	CTXsp:B-AUD:B	ENTI:B-AUD:B	FB:B-AUD:B	HPd:B-AUD:B	HPs:B-AUD:B	HY:B-AUD:B	OLF:B-AUD:B	PTL:B-AUD:B	RSP:B-AUD:B	⋯	PTL:B-OLF:B	RSP:B-OLF:B	SSp:B-OLF:B	TH:B-OLF:B	RSP:B-PTL:B	SSp:B-PTL:B	TH:B-PTL:B	SSp:B-RSP:B	TH:B-RSP:B	TH:B-SSp:B
AUD	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
COM	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
CTXsp	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
ENTI	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
FB	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0
HPd	0	0	0	0	0	0	0	0	0	0	⋯	0	0	0	0	0	0	0	0	0	0

##
res_QL_contrast=glmQLFTest(fit,contrast = contrast)

I used res_QL_contrast to find genes which have a regional difference in association with protein B, and the topTags outputs a table with contains logFC for each comparison and a single pvalue.

So i guess any significant test will only indicate that there is at least one comparison (between two regions) are different.

And to know which two regions are significantly different, I have to test on individual contrast separately.

I am planning to do something as the following :

toptags_list[]
for (i in 1:ncol(contrast){

 res_list=glmQLFTest(fit,contrast = level2_contrast[,i])

 toptags_list[i]=topTags(res_list,n=Inf)$table

}

And after finished testing all comparisons, I will concatenate all the tables in the toptags_list and run a global FDR p-values correction.

I would like to know if this a valid way to do post-hoc comparison.

And also is there a better way of doing this analysis, as my data is very large (40000 genes 4700 samples), and doing all these comparison in one combined contrast already taking a long time, not mentioning if I want to do it by individual columns.

Thank you very much in advance,

Ashley

nested design post hoc edgeR • 1.5k views

ADD COMMENT • link updated 6.9 years ago by Aaron Lun ★ 28k • written 6.9 years ago by ashley.lu • 0

score 1 · Accepted Answer · 2018-06-06

Your pairwise comparison approach seems sensible to me. And yes, it will take a while, as you have many samples and many comparisons involving a design matrix that is not a simple one-way layout. Unfortunately, you can't get around that easily - the simplest approach would be to parallelize your comparisons across multiple cores via BiocParallel (replacing the for with a bplapply).

As a general rule, I usually perform an ANODEV for its own sake, i.e., when I am only interested in genes that exhibit any change and don't care about the nature of the change. This is occasionally useful as feature selection for visualization or for downstream analyses where I need "interesting" genes that are not limited to one specific pairwise comparison. Another use case is when I'm looking at two different experimental methods for achieving the same perturbation, and I want to identify genes that exhibit any change in expression, regardless of which specific method causes it (because all affected genes are regulatory targets of the perturbation).

What this means is that I usually don't perform pairwise comparisons with the aim of explaining the ANODEV results. If I need to know the specific differences between regions, I do the pairwise comparisons directly, and skip the ANODEV entirely as it isn't particularly helpful to me. (I suppose this is mostly because I work with relatively small experimental designs; if I had groups of related contrasts, an ANODEV would be useful in identifying which group of contrasts I should prioritize for further examination.) AFAIK, there is no GLM equivalent to the combination of Tukey's HSD test and ANOVA; and in any case, the underlying concept of performing all pairwise comparisons is the same, so you're not missing out on much by doing it manually as you've proposed.