Hello,
I have an experimental design almost identical to the one portrayed in the edgeRUsersGuide() Section 3.5 Comparisons both between and within subjects. My question is not about the statistical analysis, but how to properly remove the subject/Patient effects for PCA clustering given that they are nested within Disease. The relevant codes from the section, plus 2 lines of my own:
targets$Patient #numbered 1-9 for the 9 total patients Patient <- gl(3,2,length=18) #re-numbering patients withing Disease group Disease <- factor(targets$Disease, levels=c("Healthy","Disease1","Disease2")) Treatment <- factor(targets$Treatment, levels=c("None","Hormone")) design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment) fit <- glmFit(y, design) #Additions for clustering: logCPM <- cpm(y, log = T, prior.count = 3) noPatient <- removeBatchEffect(logCPM, design = model.matrix(~Disease+Disease:Treatment), batch = ??)
The design matrix given to removeBatchEffect should not have the Patient effect in it as it's given separately in the batch argument. However, should I pass the original numbering of patient in targets$Patient
or the re-numbering of patient within Disease group in Patient
? It seems like I should pass the original numbering because the nesting of re-numbered patients within Disease isn't specified in the call to removeBatchEffect, so I think it would treat all 1s, 2s and 3s as the same batch instead of only the 1s,2s and 3s within each Disease group. Am I correct or is there yet a different way to properly do it?
Thanks,
Jenny
Thanks, Jim and Aaron. That's what I thought, that calling removeBatchEffect with the re-numbered patients would be the same as if they were not nested. The idea of computing treatment-control logFC for each patient and then doing PCA or MDS on those is good for visualization of the differences of treatment response within each disease. However, another downstream application I want to use is WGCNA, where I would want to keep the individual sample values to distinguish between modules that have positive responses to treatments and those with negative responses. I supposed I can just leave the patient effects in and use the same nested module to analyze the module eigengenes.
Cheers,
Jenny