Hi all,
I am working on an ATAC seq data set for which I have two phenotype groups (control and disease). Within each of these groups there are different cell types specific to the groups as shown in the experimental design:
Samples |
Phenotype | cell type |
1 | Control | control1 |
2 | Control | control1 |
3 | Control | control2 |
4 | Control | control2 |
5 | Control | control2 |
6 | Control | control2 |
7 | Control | control3 |
8 | Control | control3 |
9 | disease | disease1 |
10 | disease | disease1 |
11 | disease | disease1 |
12 | disease | disease1 |
13 | disease | disease2 |
14 | disease | disease2 |
15 | disease | disease3 |
16 | disease | disease3 |
I am looking for a way to run differential binding analysis that will model the different cell types separately (instead of modeling all controls together it will build separate models for control1, control2...etc) and find differential sites between control and disease. This is because there is a lot of variation across the different cell types for each phenotype (eg. there is high variation between control1 and control2 cell types).
I have tried using DiffBind using the cell types as blocking factors but that gave me an error (DBA_FACTOR=Phenotype and DBA_TREATMENT=cell type):
dba.contrast(dba_count, categories=DBA_FACTOR, block=DBA_TISSUE, minMembers=2)
Warning messages:
1: Blocking factor invalid for all contrasts: 2: No blocking values are present in both groups
I think this may be due to the cell types being specific to only control and disease phenotypes. Is there a way to redefine the groups so as to take advantage of blocking for cell type without losing biological relevance?
I have also looked at edgeR as it seems to have an example in the user guide (section 3.5) that seems to be similar to what I am doing (hence why I tagged it here) but they re-number the different patients in that example and I'm confused as to why they did that.
If you can help me with any part of these questions I would appreciate it!