Hi everybody,
I am trying to compare gene expression using RNA Seq data from 50 and 50 human tissue samples with two phenotypically similar diseases that may exhibit distinct pathophysiological differences on the transcriptomic level (= my hypothesis).
Several, but not all subjects in the study provided more than one sample, in one case even one from each disease type. Thus, I would want to correct for the subject as a random effect.
Additional (fixed) covariates include gender (disease B has more men that A) and a measure of disease severity (continuous integer values).
In order to be able to incorporate both my fixed and random factors (as a blocking factor), I used limma/voom
design <- model.matrix(~1+ targets$disease + targets$gender + targets$severity)
y <- voom(x,design,plot=F, normalize="quantile")
corfit <- duplicateCorrelation(y, design, block=targets$Patient)
y <- voom(x,design,plot=FALSE, block=targets$Patient, correlation=corfit$consensus, normalize="quantile")
fit=lmFit(y, design, block=targets$Patient, correlation=corfit$consensus)
fit <- eBayes(fit)
I have also tried TMM normalization, or voomWithQualityWeights.
My problem is that I get a number of DE genes for the disease B vs. A with large FCs (up to 10fold) that are mostly Y-linked, and/or ribosomal proteins. These results do not make sense biologically.
As a comparison, I have used DESeq2 to analyze the same data. Correcting for severity and patient as fixed factors (yes, I know, patient should be random...), I find very different genes to be differentially expressed, with lower FCs. Those make more sense biologically.
I know that DESeq2 is probably not appropriate for my data, and I would like to use limma here. But what am I doing wrong here? Is it a normalization problem?
Many thanks in advance for your thoughts and comments!
You don't show what
topTable
command you used to get your DE list. Large log-fold changes for Y-linked genes; are you sure you're not testing for the sex effect? Also, everything butseverity
had better be a factor type.Many thanks for your comment.
You are right, I should have posted that (see below).
I also looked into the gender effects - very similar genes appear, with better p-values.