Hello, I'm a bit confused as to how to set-up and run my differential expression RNAseq analysis. I have 20 different subjects, ten of which are male and ten of which are female, and I have seven time points for each subject (baseline, 0h, 4h, 8h, 12h, 16h, 24h). I want to know what genes change differently between men and women at each time point relative to baseline (e.g., are there any differences in differential expression between men and women for the comparison 8h vs baseline etc.). I also want to account for the variability between participants that is independent of sex, but I am confused as to whether this could effect the detection of sex-specific effects even if sex is used in my model.
Currently my code is as as follows:
sextime <- factor(paste(sampledata$time, sampledata$Sex, sep="."))
designsextime <- model.matrix(~0 + sextime, y$samples)
colnames(designsextime) <- gsub("sextime", "", colnames(designsextime))
contr.matrixsextime <- makeContrasts(
"h0vsbaselinedif" = (h0.M-baseline.M) - (h0.F-baseline.F),
"h4vsbaselinedif" = (h4.M-baseline.M) - (h4F-baseline.F),
"h8vsbaselinedif" = (h8.M-baseline.M) - (h8F-baseline.F),
"h12vsbaselinedif" = (h12.M-baseline.M) - (h12F-baseline.F),
"h16vsbaselinedif" = (h16.M-baseline.M) - (h16F-baseline.F),
"h24vsbaselinedif" = (h24.M-baseline.M) - (h24F-baseline.F),
levels = colnames(designsextime))
v <- voom(y, designsextime, plot=TRUE)
fit <- lmFit(v, design=designsextime)
cor <- duplicateCorrelation(counts, designsextime, block=sampleid)
fit <- lmFit(v, design=designsextime,
block=sampleid, correlation=cor$consensus.correlation)
fit <- contrasts.fit(fit, contrasts=contr.matrixsextime)
efit <- eBayes(fit)
summary(decideTests(efit, adjust.method = "BH", adj.P.Val=0.05))
Would this be appropriate, or would it be more wise to remove the duplicateCorrelation and blocking by sampleid incase it affects the detection of sex-specific differences and instead have my code as follows (and therefore not account for interindividual variability)
sextime <- factor(paste(sampledata$time, sampledata$Sex, sep="."))
designsextime <- model.matrix(~0 + sextime, y$samples)
colnames(designsextime) <- gsub("sextime", "", colnames(designsextime))
contr.matrixsextime <- makeContrasts(
"h0vsbaselinesex" = (h0.M-baseline.M) - (h0.F-baseline.F),
"h4vsbaselinesex" = (h4.M-baseline.M) - (h4F-baseline.F),
"h8vsbaselinesex" = (h8.M-baseline.M) - (h8F-baseline.F),
"h12vsbaselinesex" = (h12.M-baseline.M) - (h12F-baseline.F),
"h16vsbaselinesex" = (h16.M-baseline.M) - (h16F-baseline.F),
"h24vsbaselinesex" = (h24.M-baseline.M) - (h24F-baseline.F),
levels = colnames(designsextime))
v <- voom(y, designsextime, plot=TRUE)
fit <- lmFit(v, design=designsextime)
cor <- duplicateCorrelation(counts, designsextime)
fit <- lmFit(v, design=designsextime)
fit <- contrasts.fit(fit, contrasts=contr.matrixsextime)
efit <- eBayes(fit)
summary(decideTests(efit, adjust.method = "BH", adj.P.Val=0.05))
or alternatively, analyse the differential expression of men and women separately, where i can 'safely' use blocking by sampleid and then compare statistically the log2FCs generated for each gene in each contrast between men and women
What does
sampleid
represent? If you use duplicateCorrelation(), then the blocking factor should be subject, but it isn't clear from your question whethersampleid
is the subject identifier or something else. Ifsampleid
is literally a sample identifier, and is different for every library, then the duplicateCorrelation analysis would not make sense and will give an error.BTW, some of your code lines don't folllow limma syntax. The
decideTests
function doesn't have anadj.P.Val
argument and you can't runduplicateCorrelation
without a blocking factor.