Question

Understanding the Log2FoldChange calculated with DESeq2 and according to formula log2(treated/control)

0

Entering edit mode

User000 • 0

@ea03770f

Last seen 2.4 years ago

Italy

Hello,

I have tried to understand how DESeq2 calculates the Log2FoldChange. I extracted the normalised counts from dds like below, calculated the mean of treated and tried to find the log2FC according to the formula: log2(treated/control). But the log2FC I get using this method is different the one I get using DESeq2. Moreover, it seems like log2FC reported with DESeq2 is calculated based of difference of mean treated and control, like this: log2FC=mean(treated)-mean(control). Why is that? Am i missing something?

newdata <- as.data.frame(cbind(dds@rowRanges@elementMetadata@listData$conditionTREATED1,dds@rowRanges@elementMetadata@listData$conditionTREATED2,dds@rowRanges@elementMetadata@listData$conditionTREATED3,dds@rowRanges@elementMetadata@listData$conditionCTRL1))
newdata$mean <- apply(newdata[,1:3],1, mean) 
newdata$log <- log2(newdata$mean/newdata$conditionCTRL1)

For example using the example below:

 CTRL1 mean_treated
 12.30 12.44

With my calculation I get log2(12.44/12.30)=0.016
DESeq2 gives this value 0.14, which is the difference between 12.44-12.30=0.14

This is my script:

setwd("path/to/work_dir/")
sampledata <- data.frame(run = c("RNA_1","RNA_2","RNA_3","RNA_4","RNA_5","RNA_6","RNA_7","RNA_8","RNA_9","RNA_10","RNA_11","RNA_12"), condition = c("TREATED1","TREATED1","TREATED1","TREATED2","TREATED2","TREATED2","TREATED3","TREATED3","TREATED3","CTRL1","CTRL1","CTRL1"))
sampledata$condition <- factor(sampledata$condition)
table(sampledata$condition)
#Read all RNAseq files
dir <- ("/path/to/mydir")
list.files(dir)
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
files <- file.path(dir, samples$run, "quant.sf")
names(files) <- c(1:12)
all(file.exists(files))
#Create a tx2gene file
txdb <- makeTxDbFromGFF(file="gencode.v39.annotation.gtf")
saveDb(x=txdb, file = "gencode.v39.annotation.TxDb")
k <- keys(txdb, keytype = "TXNAME")
tx2gene <- AnnotationDbi::select(txdb, k, "GENEID", "TXNAME")
head(tx2gene)
#Create differential expression
txi.g <- tximport(files, type="salmon", tx2gene=tx2gene)
tcs <- txi.g$counts
mm = model.matrix(~0+condition, sampledata)
dds <- DESeqDataSetFromTximport(txi.g, colData=sampledata, design=mm)
nm <- assays(dds)[["avgTxLength"]]
sf <- estimateSizeFactorsForMatrix(counts(dds)/nm)
#Filter for genes/transcripts that have >= 10 reads
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
sapply(split(colSums(counts(dds))/1e6, sampledata$condition), summary)
#Calculate dds
dds <- DESeq(dds)
res <- DESeq2::results(dds, 
                       contrast = list(c("conditionTREATED1","conditionTREATED2","conditionTREATED3"), "conditionCTRL1"), 
                       listValues = c(1/3,-1)
)
contrast = (list(c("conditionTREATED1","conditionTREATED2","conditionTREATED3"),( "conditionCTRL1",listValues = c(1/3,-1)))
resLFC <- lfcShrink(dds, contrast=contrast, res=res, type="ashr")

DESeq2 log2FC • 6.3k views

ADD COMMENT • link 2.6 years ago User000 • 0

score 0 · Answer 1 · 2022-09-21

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 23 hours ago

San Diego

If DESeq just returned a simple log of the ratio, you could do that in Excel. DESeq is doing something more complex. If your contrast is made correctly, the value you calculate should be close to what you could calculate yourself, but not the same.

ADD COMMENT • link 2.6 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

I added an example, may be my explanation was not completely clear.

ADD REPLY • link 2.6 years ago User000 • 0

0

Entering edit mode

I'm not sure what those numbers you posted are, but the point of logs is to avoid long division. You say you are doing doing log(a/b), but log(a) - log(b) is the same thing, so if those numbers are the average of the logged counts, then subtracting one from the other will give you the ratio.

ADD REPLY • link 2.6 years ago swbarnes2 ★ 1.4k

0

Entering edit mode

Thank you very much for taking your time and answering. I did not write that the difference is between logs. For me It is obvious that log(a/b) and log(a)-log(b) is the same thing. If you could I suggest you to read better the question, if it is not clear please just ask me clarifications. I really need to understand the problem I posted above. Thanks again.

ADD REPLY • link 2.6 years ago User000 • 0