Question

Advice to use block factor and duplicatecorrelation in RNA-seq experiment

0

Entering edit mode

jfertaj ▴ 30

@jfertaj-8566

Last seen 2.6 years ago

United Kingdom

Hi all,

I have data from a RNA-seq experiment where 90 individuals were sequenced. Three biopsies were taken from each patient at different locations, i.e, for the same individual we have three different RNA-seq samples corresponding to three different locations (proximal, distal, rectum). We are interested in see which are the differences between locations.

I thought of using a blocking factor with duplicateCorrelation to account for the fact that each patient has 3 different locations.

An example of the metadata and of the steps I'm thinking of using:

head(targets)
  Sample_Name  Location    sex  age  patient_ID
1        2941  Proximal   Male  68          294
2        2942    Distal   Male  68          294
3        2943    Rectum   Male  66          294
4        1331  Proximal Female  24          133
5        1332    Distal Female  24          133
6        1333    Rectum Female  24          133

location <- as.factor(targets$location)
sex <- as.factor(targets$sex)
age <- targets$age

#dataset is the expression data object

design <- model.matrix(0~Location+sex+age)
corfit <- duplicateCorrelation(dataset, design, block=targets$patient_ID)

fit <- lmFit(dataset, design, block = targets$patient_ID, correlation = corfit$consensus)
efit <- eBayes(fit, robust=TRUE)

Are these steps appropriate to account for patient effect?

Thanks

limma • 1.2k views

ADD COMMENT • link 3.4 years ago jfertaj ▴ 30

0

Entering edit mode

Can you please clarify why the ages are so different for different samples from the same patient? For patient 133, for example, is it really true that the distal sample was collected 33 years after the proximal sample? Or am I misinterpretting the age column?

ADD REPLY • link 3.4 years ago Gordon Smyth 52k

0

Entering edit mode

Sorry Gordon Smyth, it was a problem of copy/paste from the console, now it is fixed. The column age is the patient's age.

ADD REPLY • link 3.4 years ago jfertaj ▴ 30

score 1 · Accepted Answer · 2021-10-30

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

The usual design matrix for this sort of experiment is

mode.matrix(0 ~ Location + patient_ID)

There's no need to adjust for age or sex. There's no gain to be had from duplicateCorrelation unless you don't have all three biopsies from all the patients.

ADD COMMENT • link 3.4 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks Gordon Smyth, I have 3 biopsies for all patients except one, if I don't need to use duplicateCorrelation then it will save me some computing time when I have to analyse methylation array data

ADD REPLY • link 3.4 years ago jfertaj ▴ 30