Best way to include RIN in differential expression design
2
1
Entering edit mode
@charlesfoster-17652
Last seen 15 months ago
Australia

Hi all,

I've assembled a transcriptome de novo, and now wish to carry out differential expression analysis using DESeq2. The RIN varies between samples for a variety of reasons, and by the nature of our study system we could not simply refine extractions until having RIN >8 for all samples.

I used my normal pipeline of sample processing followed by DE analysis with DESeq2 and got DE results with no issues, but because RIN can affect gene expression I also want to explore including RNA integrity number (RIN) in my DE model design. I've seen a few recommendations of how to do this on Bioconductor and Biostars, but I could not determine what the consensus was of the best way to do so (if such a consensus even exists).

I've trialled the impact of a couple of different methods by visualising the impact on PCA plots after removing the effect with limma::removeBatchEffect, like so:

1) Including RIN in the model design as a numerical vector

dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~RIN+Group)
dds <- DESeq(dds)
colData(dds)

enter image description here

vsd <- vst(dds, blind=FALSE)
Group <- factor(colData(vsd)$Group)
RIN <- samples$RIN
design0 <- model.matrix(~Group)
vsd2 <- vsd # backup
assay(vsd2) <- limma::removeBatchEffect(assay(vsd), covariates = vsd$RIN, design=design0) #include RIN as a numerical covariate
plotPCA(vsd2, "Group", ntop=70128)

enter image description here

2) Splitting the vector of RINs into a factor with five bins

vsd3 <- vsd # backup. I would also re-run DESeq(dds) with the proper new design.
RIN_cut <- cut(RIN, breaks = 5) #split the RINs into a factor with 5 bins
RIN_cut

[1] (8,8.6] (7.4,8] (8,8.6] (7.4,8] (8,8.6] (5.6,6.2] (5.6,6.2] (6.8,7.4] (5.6,6.2] (8,8.6]
Levels: (5.6,6.2] (6.2,6.8] (6.8,7.4] (7.4,8] (8,8.6]

assay(vsd3) <- limma::removeBatchEffect(assay(vsd), batch = RIN_cut, design=design0) #remove RIN as a 'batch'
plotPCA(vsd3, "Group", ntop=70128)

enter image description here (apologies for the size of the plots)

By eyeballing the two PCAplots, the second option including RIN as a factor looks to have done a better job at separating the two groups. However, I don't know whether this is an artificial separation, i.e. whether the treatment of RINs is somehow biasing the results.

Does anyone have any suggestions about which of the two treatments is better, or even have a suggestion for a better treatment? Without any treatment of RINs, the PCA looks very similar to Option 1 above, only with PC1 contributing 47% (vs 50% for Opption 1 above)..

Thanks, Charles

deseq2 limma • 1.9k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 days ago
United States

I would prefer SVA or RUV over directly using RIN. I have some threads on the support site that expand on this, maybe you can try to dig those up.

ADD COMMENT
0
Entering edit mode

Thanks for your advice. I've done a bit of digging to read further on SVA. Instead of trying to "guess" (model) the impact of RNA degradation by using RIN, I'll just use SVA to account for hidden surrogate variables. Following the code from the DESeq2 vignette:

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ Group, colData(dds))
mod0 <- model.matrix(~   1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)
dds$SV1 <- svseq$sv[,1]
dds$SV2 <- svseq$sv[,2]
design(dds) <- ~ SV1 + SV2 + Group
dds <- DESeq(dds)

I'm not overly worried about RIN having a big impact on my results considering there are no clear trends of samples clustering based on RIN in a PCA plot of vsd counts (without taking RIN into account). It's just good to know the right way to go about these analyses. Cheers!

ADD REPLY
1
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 10 days ago
Barcelona/Universitat Pompeu Fabra

If you have not done yet, you probably should read the work by Gallego Romero et al. (2014) on exactly this issue, where they also advise to put RIN as a covariate in the linear model when it is not associated with the outcome of interest. However, as Mike suggets, probably using SVA or RUV may lead to better results depending on the dataset (number of samples, range of differences in RIN, etc.). I would also suggest to try using sample-specific quality weights Ritchie et al., (2006) as explained in subsection "15.6 Voom with sample quality weights" from the limma User's Guide.

cheers,

robert.

ADD COMMENT

Login before adding your answer.

Traffic: 1121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6