Hi all,
I have data from a RNA-seq experiment where 90 individuals were sequenced. Three biopsies were taken from each patient at different locations, i.e, for the same individual we have three different RNA-seq samples corresponding to three different locations (proximal, distal, rectum). We are interested in see which are the differences between locations.
I thought of using a blocking factor with duplicateCorrelation
to account for the fact that each patient has 3 different locations.
An example of the metadata and of the steps I'm thinking of using:
head(targets)
Sample_Name Location sex age patient_ID
1 2941 Proximal Male 68 294
2 2942 Distal Male 68 294
3 2943 Rectum Male 66 294
4 1331 Proximal Female 24 133
5 1332 Distal Female 24 133
6 1333 Rectum Female 24 133
location <- as.factor(targets$location)
sex <- as.factor(targets$sex)
age <- targets$age
#dataset is the expression data object
design <- model.matrix(0~Location+sex+age)
corfit <- duplicateCorrelation(dataset, design, block=targets$patient_ID)
fit <- lmFit(dataset, design, block = targets$patient_ID, correlation = corfit$consensus)
efit <- eBayes(fit, robust=TRUE)
Are these steps appropriate to account for patient effect?
Thanks
Can you please clarify why the ages are so different for different samples from the same patient? For patient 133, for example, is it really true that the distal sample was collected 33 years after the proximal sample? Or am I misinterpretting the age column?
Sorry Gordon Smyth, it was a problem of copy/paste from the console, now it is fixed. The column age is the patient's age.