Question

expression matrix with varianceStabilizingTransformation

0

Entering edit mode

lirongrossmann ▴ 80

@lirongrossmann-13938

Last seen 4.5 years ago

Hey all,

I have an rna-seq expression matrix and I used Deseq2 to compare gene expression between two groups of in my dataset. I then wanted to to sort the genes I got from Deseq2 according to their levels of expression, and I used the varianceStabilizingTransformation (vsd) to get the normalized expression data.

The problem I had is the following:

When I changed one of the groups (omitted few samples), and applied again the vsd function, I saw that the expression levels of some genes actually changed for samples that were not removed from the dataset. That is, the expression values for the samples that remained in the dataset were changed just by omitting other samples from the dataset.

Is there a way to get a normalized matrix with expression values for each sample that does not depend on other samples?

This is my code:

ep<-read.table("expression.txt",header = TRUE, row.names = 1)
cp<-read.csv("metadata.csv")
dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Group)
dds <- estimateSizeFactors(dds)
vsd <- varianceStabilizingTransformation(dds)

deseq2 variancestabilizingtransformation rnaseq • 2.2k views

ADD COMMENT • link updated 7.5 years ago by Michael Love 43k • written 7.5 years ago by lirongrossmann ▴ 80

Michael Love · Answer 1 · 2017-10-16

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 6 weeks ago

EMBL European Molecular Biology Laborat…

Dear lirongrossmann

The transformation parameters depend on the statistical distribution of the data, so it is to be expected that the transformation changes (a bit) if data are added or removed, especially if these make replicate variances look higher or lower.

If your goal is sorting genes by overall expression, you can do something like

mcr = matrixStats::rowMedians(counts(dds, normalized = TRUE))

and sort by mcr.

Kind regards

Wolfgang

ADD COMMENT • link 7.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Thank you !

ADD REPLY • link 7.5 years ago lirongrossmann ▴ 80

0

Entering edit mode

Thanks Dear Wolfgang,

One clarification please: if I want to use the genes I got for my Deseq2 to build a machine learning model using a training and validation set, would you use the counts (using the counts data you recommend) or the transformed version of of the data (using the variance stabilizing function) as an input to the learning algorithm?

Thank!

ADD REPLY • link updated 7.5 years ago by Michael Love 43k • written 7.5 years ago by lirongrossmann ▴ 80

1

Entering edit mode

We recommend using variance stabilized, transformed data for downstream methods that benefit from homoskedasticity (same scale of variance across the dynamic range)

ADD REPLY • link 7.5 years ago Michael Love 43k