[DESEQ2] How to access the normalized data of a DESeqDataSet
2
5
Entering edit mode
john ▴ 130
@john-7466
Last seen 9.4 years ago
Germany

Hello guys,

By  calling DESeq() on a DESeqDataset it estimates the size factors (normalization) automatically. How can I access this data? I must be stored somewhere. I would like to access the normalized counts.

>dds <- DESeq()

estimating size factors
estimating dispersions
gene-wise dispersion estimates

....

I know there is the counts() function but why use this if the calculation is already done?

## S4 method for signature 'DESeqDataSet'
     counts(object, normalized = FALSE)

 

Does anybody have any hints?

cheers,

John

 

deseq2 • 71k views
ADD COMMENT
5
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

If you have a fresh dds, you can just do:

dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)

This is just dividing each column of 

counts(dds)

by

sizeFactors(dds)

You can pull up the help for all functions with:

help(package="DESeq2",help="html")

And there is a section of the vignette, "Access to all calculated values":

vignette("DESeq2")
ADD COMMENT
0
Entering edit mode

HI Micheal,

Am new to R. I am trying to do DESeq differential expression for my RNA-seq Normalized counts. Can i get any scripts which can understand very easy. Thanks in advance.

ADD REPLY
0
Entering edit mode

Please follow the post above, by reading the vignette. After installing DESeq2, you can just type into your R session:

vignette("DESeq2")

You can also follow this workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/

ADD REPLY
0
Entering edit mode

Hi Michael Hope things are well with you! When outputting normalized counts from a dds object like this:

dds <- estimateSizeFactors(dds); 
counts(dds, normalized=TRUE)

... will these counts be normalized to gene length, taking into consideration that counts were imported using tximport and the tx2gene parameter (which passes gene length to the dds object)?

I don't think so, but thought it would be better to ask. Hope this makes sense

ADD REPLY
2
Entering edit mode

They are scaled in such a way that any biases across samples related to isoform switching are removed.

But they do not have the typical "normalization for gene length" applied, in that longer genes will have larger values in the matrix you obtain. E.g. if a gene has length L and another with length 2L, you would also expect the second gene to have normalized counts that were twice as large.

ADD REPLY
0
Entering edit mode

Thanks, Michael! I just wanted to confirm that because I was working with two dds objects (one created based on a tximport object, and the other one imported using a count matrix). When I transform the dds objects using the vst function, it prints a message (for the dds object created using tximport) saying:

using 'avgTxLength' from assays(dds), correcting for library size

... which I thought meant that it was correcting for transcript length too. Thanks for clarifying!

ADD REPLY
2
Entering edit mode
@steve-lianoglou-2771
Last seen 20 months ago
United States

While Michael has answered your question "in spirit", allow me to provide an answer to your direct question:

  I know there is the counts() function but why use this if the calculation is already done?

Because the "normalized" data isn't actually stored anywhere. The only thing that is stored are the factors one can use to normalize the raw count data if required.

ADD COMMENT
1
Entering edit mode

Yes, Steve's right. I missed this part of the question.

The point of the software and other count-based methods is to model the raw counts, so that estimation steps take into account the variance profile of counts. If you look over the methods, you'll see that almost all steps use K_ij which is the raw count for gene i and sample j. The normalized counts K_ij/s_j are only used to give each gene a single mean value for the dispersion trend regression (equation 5).

ADD REPLY
0
Entering edit mode

Steve, Michael,

thanks that really helps me out and answeres my question. It takes quite some time to fully understand the DESeq2 package.

I have just downloaded the source code of the package. I found almost all functions like plotCounts() (which is in the "R" folder) but I still cannot find the counts() method. Where did you hide it? (-:

ADD REPLY
0
Entering edit mode
You can find function definitions in source code files in the R directory with grep "counts <-" *.R The counts method is defined in R/methods.R
ADD REPLY
0
Entering edit mode

AH yes- I found this:

counts.DESeqDataSet <- function(object, normalized=FALSE) {

...}

Which is the same I guess.

 

ADD REPLY
0
Entering edit mode

I agree with you.

ADD REPLY

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6