I am working with large panel of human tissue cells that are collected from different centres. Its a paired design i.e same sample has treated and untreated rna-seq data.
Here is how my colData looks like
> colData purity subjects treat Center Year BMI Gender Age sample1_treat 95 1 treat Miami 2011 <NA> M <NA> sample1_untreat 95 1 untreat Miami 2011 <NA> M <NA> sample2_treat 90 2 treat Milano 2011 <NA> F <NA> sample2_untreat 90 2 untreat Milano 2011 <NA> F <NA> sample3_treat 75 3 treat Geneva 2011 20 F 50 sample3_untreat 75 3 untreat Geneva 2011 20 F 50 . . . . . .
As I have been believing that the paired design takes any batch effects in to account, I used the following design for DE analysis to look for differential genes between treated vs untreated
:
design(dds) <- formula(~ subjects + treat)
I am also intended to do exploratory data analysis, for that I am using a matrix of log Fold change values generated by:
norm <- assay(normTransform(dds)) i <- seq.int(1L,72,by = 2L) # 72 is the total number of samples. norm.fc <- norm[,i]-norm[,i+1] write.table(norm.fc, file="normalised_FCMatrix.txt", sep="\t")
And I am using the norm.fc
for doing clustering and PCA analysis.
I would like to know if my design for DE analysis is correct. Do I need to incorporate any other information to account for tissue collection / sequencing centre etc.
So if I want to look if the treatment has different effect on males vs females, I need to create the appropriate
model.matrix
?So I should ask the program to give me if there is any difference in the gene expression in males vs females w.r.t treatment ? It looks complicated to ask "If there is any difference between treated vs untreated samples in males and females"
If you fit a model like
Then you will be testing to see if there is a gender-specific effect for any genes, which in the end could be interpreted as an interaction between sex and treatment. If there aren't too many genes that have a gender-specific effect (for some interpretation of 'too many', and certainly not including genes from either chrX or chrY), then you might conclude that gender isn't a big driver of any differences, and then ignore it. Or it there are 'enough' genes that are significant, you can just add Gender as a nuisance variable. The same is true of age, although it looks like you don't have complete information for that.