plotMDS Input Feature Filtering
2
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 14 hours ago
Australia

When examining the most variable genes for a clinical cancer dataset generated as batches on different days, I notice that the genes are mainly ones located on the X and Y chromosomes such as XIST, UTY, and ZFY. Is it common practice to subset the matrix to remove these before making MDS plots, as they may mask more subtle variations between RNA-seq experiments done on different days? The edgeR workflow uses a dataset of all female mouse samples, so doesn't have such as issue and the vignette's oral squamous cell carcinoma dataset doesn't make any clinical details public. Is using only a set of widely accepted housekeeping genes a better approach that a set of the most variable ones?

edgeR Exploratory Data Analysis • 1.5k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 12 hours ago
The city by the bay

Is there a reason why you can't run removeBatchEffect on the log-CPMs prior to visualization, to eliminate the batch effect? This would solve the problem more directly than trying to pick genes to discard. In your specific case, it sounds like you have a sex effect, which may or may not be related to the batch structure.

If blocking is not possible, I often remove highly variable genes that are not biologically interesting. This includes X/Y-chromosome genes in the presence of sex effects, variable immunoglobulin segments when studying B cells, and ribosomal proteins strongly affected by technical differences in library preparation. As long as the removal can be justified (by some other reason than "it was highly variable"), I don't see a problem.

I don't see the benefit of using a set of housekeeping genes for visualizing differences between samples. You'd just end up with a big homogeneous clump of samples in the middle of the MDS plot, without capturing any of the biological structure present in the expression profiles of DE genes.

ADD COMMENT
0
Entering edit mode

Certainly, I could. I wanted to see if there was an obvious batch effect before I applied batch effect correction, because I like to try and avoid between-sample normalisation if it's there's no clear difference. The first batch has 14 male and 4 female samples, whereas the second has 21 male and 21 female samples. I'll include gender as a term in the design matrix, since there are both genders assayed in both batches. The batches were prepared by the sample RNA-seq protocol.

ADD REPLY
2
Entering edit mode
@gordon-smyth
Last seen 8 hours ago
WEHI, Melbourne, Australia

Yes, it would make sense to filter out the sex-linked genes to see more clearly what the batch effect looks like with the sex inbalance removed.

In our practice, we frequently filter out sex-linked genes when (i) the experimental conditions are not themselves sex-linked in their effects but (ii) both male and female samples are included in the study. We do this for the whole analysis, not just temporarily for the MDS plot. Our working definition of sex-linked genes is XIST plus the Y chromosome.

Including sex as a predictor in the design matrix is the alternative, but removing sex-linked genes often works better for small-scale studies because the sex-linked genes are small in number and well defined.

In your case you have enough samples to take either approach, but I would be tempted to remove the sex-linked genes.

Like Aaron, I don't see any purpose in restricting to house-keeping genes. That would seem to defeat the purpose of the MDS plot.

ADD COMMENT

Login before adding your answer.

Traffic: 958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6