How to get DESeq2 sizefactors from data imported using tximport
1
0
Entering edit mode
stan • 0
@stan-7634
Last seen 4.8 years ago
South Africa/Pretoria

Hi,

How can i get sizeFactors from sailfish output imported into DESeq2 using tximport. I got NULL after doing the following:

> txi <- tximport(files, type="sailfish", tx2gene=tx2gene, dropInfReps=TRUE)
> ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ condition)
> estimateSizeFactors(ddsTxi)
using 'avgTxLength' from assays(dds), correcting for library size
class: DESeqDataSet
dim: 39045 5
metadata(1): version
assays(3): counts avgTxLength normalizationFactors
rownames(39045): ENSG00000000003 ENSG00000000419 ... ENSG00000283698 ENSG00000283699
rowData names(0):
colnames(5): sample1 sample2 sample3 sample4 sample5
colData names(1): condition
> sizeFactors(ddsTxi)
NULL

Thanks for the help.

Stan

deseq2 sailfish tximport • 4.9k views
ADD COMMENT
4
Entering edit mode
@mikelove
Last seen 2 days ago
United States

hi,

DESeq2 uses normalizationFactors when avgTxLength offset is imported with DESeqDataSetFromTximport. This combines size factor normalization with average transcript length normalization. So it's a matrix accessible via:

normalizationFactors(dds)

if you want to see the size factors that were calculated and incorporated with the average transcript lengths, they are these :

nm <- assays(dds)[["avgTxLength"]]
sf <- estimateSizeFactorsForMatrix(counts(dds), normMatrix=nm)
ADD COMMENT
0
Entering edit mode

Thanks a lot Michael, 

I noticed normMatrix argument is not used for the estimateSizeFactorsForMatrix function, so instead would it make sense to calculate the geometric mean of each sample (column) from normalizationFactors(dds), and use that as the size factor for each sample? 

Or should I just get size factors using:

sf <- estimateSizeFactorsForMatrix(counts(dds)) # without the normMatrix argument

Alternatively does it make sense to get size factors using this:

sf  <- estimateSizeFactorsForMatrix(normalizationFactors(dds)) 
# since normalizationFactors(dds) "combines size factor normalization with average transcript length normalization" quoting from you above
ADD REPLY
0
Entering edit mode

Sorry, you are correct. I have a separate function estimateNormFactors(). The code inside this function is:

sf <- estimateSizeFactorsForMatrix(counts(dds) / nm)
ADD REPLY
0
Entering edit mode

Thanks again Michael, it's working fine now.

ADD REPLY
0
Entering edit mode

Hi, Michael,

  I'm confused...as in my case, 

sf1 <- estimateSizeFactorsForMatrix(counts(dds))
sf2  <- estimateSizeFactorsForMatrix(normalizationFactors(dds)) 
sf3 <- estimateSizeFactorsForMatrix(counts(dds) / nm)

Gives me three different results... If I only want to check the size factor, which one should I choose?

ADD REPLY
0
Entering edit mode

The last one, this is the one I indicated above. The others are not correct for finding the size factors that were calculated and incorporated into normalization factors. The second has no interpretation. The first is the size factors that would be calculated if we ignored the extra information from tximport.

ADD REPLY
0
Entering edit mode

Thanks very much!

ADD REPLY
0
Entering edit mode

Michael Love

I imported transcripts abundance data using DESeqDataSetFromTximport. However, normalizationFactors(ddsTxi) is returning NULL. Is there anything unusual in the following code? I assume that by default normalizationFactors(ddsTxi) should return some numbers.

> samples <- read.table("samples_list.txt", header=TRUE)
> files <- file.path("From_Kallisto",  samples$Sample_ID,  "abundance.tsv")
> names(files) <- paste0(c("TRG1","TRG2", "TRG3", "LM1Tr", "LM2Tr", "LM3Tr", "TrHM1", "TrHM2", "TrHM3"))
> tx2gene <- read_csv("tx2gene_ensemble_T_reesei_RUT-C30_v56.csv")
> txi <- tximport(files, type="kallisto", tx2gene=tx2gene)
> samples$Groups <- factor(samples$Groups)
> rownames(samples) <- samples$Sample_ID
> ddsTxi <- DESeqDataSetFromTximport(txi, colData= samples, design = ~ Groups)
> ddsTxi
class: DESeqDataSet
dim: 9849 9
metadata(1): version
assays(2): counts avgTxLength
rownames(9849): DRAFT_100245 DRAFT_100463 ... DRAFT_46979
  DRAFT_99291
rowData names(0):
colnames(9): TRG1 TRG2 ... TrHM2 TrHM3
colData names(2): Sample_ID Groups
> normalizationFactors(ddsTxi)
NULL
ADD REPLY
0
Entering edit mode

It will have normalizationFactors() after you run estimateSizeFactors() or DESeq(). At this step it only have AvgTxLength from the effective lengths from Salmon, then weighted average by their abundance within gene (like RSEM does).

Note that triple backtick code fencing allows you to code block a region on support site.

ADD REPLY

Login before adding your answer.

Traffic: 594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6