Question

Best way to include RIN in differential expression design

1

Entering edit mode

charles.foster ▴ 160

@charlesfoster-17652

Last seen 11 weeks ago

Australia

Hi all,

I've assembled a transcriptome de novo, and now wish to carry out differential expression analysis using DESeq2. The RIN varies between samples for a variety of reasons, and by the nature of our study system we could not simply refine extractions until having RIN >8 for all samples.

I used my normal pipeline of sample processing followed by DE analysis with DESeq2 and got DE results with no issues, but because RIN can affect gene expression I also want to explore including RNA integrity number (RIN) in my DE model design. I've seen a few recommendations of how to do this on Bioconductor and Biostars, but I could not determine what the consensus was of the best way to do so (if such a consensus even exists).

I've trialled the impact of a couple of different methods by visualising the impact on PCA plots after removing the effect with limma::removeBatchEffect, like so:

1) Including RIN in the model design as a numerical vector

dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~RIN+Group)
dds <- DESeq(dds)
colData(dds)

enter image description here

vsd <- vst(dds, blind=FALSE)
Group <- factor(colData(vsd)$Group)
RIN <- samples$RIN
design0 <- model.matrix(~Group)
vsd2 <- vsd # backup
assay(vsd2) <- limma::removeBatchEffect(assay(vsd), covariates = vsd$RIN, design=design0) #include RIN as a numerical covariate
plotPCA(vsd2, "Group", ntop=70128)

enter image description here

2) Splitting the vector of RINs into a factor with five bins

vsd3 <- vsd # backup. I would also re-run DESeq(dds) with the proper new design.
RIN_cut <- cut(RIN, breaks = 5) #split the RINs into a factor with 5 bins
RIN_cut

[1] (8,8.6] (7.4,8] (8,8.6] (7.4,8] (8,8.6] (5.6,6.2] (5.6,6.2] (6.8,7.4] (5.6,6.2] (8,8.6]
Levels: (5.6,6.2] (6.2,6.8] (6.8,7.4] (7.4,8] (8,8.6]

assay(vsd3) <- limma::removeBatchEffect(assay(vsd), batch = RIN_cut, design=design0) #remove RIN as a 'batch'
plotPCA(vsd3, "Group", ntop=70128)

enter image description here (apologies for the size of the plots)

By eyeballing the two PCAplots, the second option including RIN as a factor looks to have done a better job at separating the two groups. However, I don't know whether this is an artificial separation, i.e. whether the treatment of RINs is somehow biasing the results.

Does anyone have any suggestions about which of the two treatments is better, or even have a suggestion for a better treatment? Without any treatment of RINs, the PCA looks very similar to Option 1 above, only with PC1 contributing 47% (vs 50% for Opption 1 above)..

Thanks, Charles

deseq2 limma • 2.5k views

ADD COMMENT • link updated 4.8 years ago by Robert Castelo ★ 3.4k • written 4.8 years ago by charles.foster ▴ 160

score 1 · Answer 1 · 2020-04-22

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 12 hours ago

United States

I would prefer SVA or RUV over directly using RIN. I have some threads on the support site that expand on this, maybe you can try to dig those up.

ADD COMMENT • link 4.8 years ago Michael Love 43k

0

Entering edit mode

Thanks for your advice. I've done a bit of digging to read further on SVA. Instead of trying to "guess" (model) the impact of RNA degradation by using RIN, I'll just use SVA to account for hidden surrogate variables. Following the code from the DESeq2 vignette:

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Group, colData(dds))
mod0 <- model.matrix(~   1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)
dds$SV1 <- svseq$sv[,1]
dds$SV2 <- svseq$sv[,2]
design(dds) <- ~ SV1 + SV2 + Group
dds <- DESeq(dds)

I'm not overly worried about RIN having a big impact on my results considering there are no clear trends of samples clustering based on RIN in a PCA plot of vsd counts (without taking RIN into account). It's just good to know the right way to go about these analyses. Cheers!

ADD REPLY • link 4.8 years ago charles.foster ▴ 160

score 1 · Answer 2 · 2020-04-23

If you have not done yet, you probably should read the work by Gallego Romero et al. (2014) on exactly this issue, where they also advise to put RIN as a covariate in the linear model when it is not associated with the outcome of interest. However, as Mike suggets, probably using SVA or RUV may lead to better results depending on the dataset (number of samples, range of differences in RIN, etc.). I would also suggest to try using sample-specific quality weights Ritchie et al., (2006) as explained in subsection "15.6 Voom with sample quality weights" from the limma User's Guide.

cheers,

robert.