Code from the author, edited to our files.

Question

CQN to DEseq2

0

Entering edit mode

nicholas.macknight • 0

@nicholasmacknight-22426

Last seen 5.2 years ago

Hello,

I have made great use of your DESeq2 package and grateful for it. I have recently normalized using conditional quantile normalization to remove gene length bias. This CQN normalization gives me log expression data that includes negatives. I am unable to perform the DESeqDataSetFromMatrix function to begin identifying differentially expressed genes. Is there a way to identify differentially expressed genes using log expressed data with negative values in DESeq2? Or is there an output file or step in the CQN normalization that has gene length adjusted count data that can be fed into the DESeqDataSetFromMatrix function?

Thank you for your time to help me understand this.

CQN deseq2 • 2.8k views

ADD COMMENT • link updated 2.4 years ago by Sally • 0 • written 5.4 years ago by nicholas.macknight • 0

0

Entering edit mode

My original cqn code,

cqnres <- cqn(counts = counts,x = df.subset$GC,lengths = df.subset$length) # cqn normalization
CQNnorm <- cqnres$y + cqnres$offset # values are in log2

I have managed to remove the negative log values by using:

cqnOffset <- cqn_res$glm.offset

cqnNormFactors <- exp(cqnOffset)

as provided by the DESeq2 vignette. However, DESeq2 only seems to like integers, so I am working to figure that out.

ADD REPLY • link 5.4 years ago nicholas.macknight • 0

0

Entering edit mode

You should provide integers as counts and the offsets as directed in the vignette.

ADD REPLY • link 5.4 years ago Michael Love 43k

0

Entering edit mode

I have looked at the vignette for both scripts, which is how I received the code to remove the negatives above, which was a helpful step. I still have an issue that I do not see being addressed in the vignettes and hoping to get clarification from more experienced users. From my understanding, CQN output is log-transformed and not as an integer. So if CQN output is ONLY log data and DESeq2 ONLY accepts non-negative integers, how can CQN data be performed and taken into consideration when running DESeq2? If this answer is in the vignettes, I would benefit from anyone's time to make that clear to me so I can have confidence in my next steps.

ADD REPLY • link 5.4 years ago nicholas.macknight • 0

0

Entering edit mode

The handoff between the two packages is to use DESeq2 with the original counts and to supply the offset matrix calculated by CQN as a normalizationFactor for the dds object. This is the paradigm we have in the vignette. Is there a remaining question?

ADD REPLY • link 5.4 years ago Michael Love 43k

0

Entering edit mode

Okay, I am putting my code here for anyone who comes across this and has a similar issue. Perhaps it can offer guidance. It may not be pretty code, but she runs. Certainly, consider the DESeq2 and CQN vignettes first.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("cqn") library(cqn) library(scales)

counts <- read.csv("cavernosacount.csv", header = TRUE, stringsAsFactors = FALSE) #raw counts from species length <- read.csv("Mcavtranscriptlength.csv", header = TRUE, stringsAsFactors = FALSE) #transcript lengths of each comp(gene) gc <- read.csv("McavGC.csv", header = TRUE, stringsAsFactors = FALSE) #gc percentage as a decimal for each comp df <- merge(gc,length,by="comp") df.subset <- merge(df,counts, by="comp") #merge all to put counts, length, and gc in the same order and to subset and assign gc, length and count to comps. df.subset <- df.subset[ !duplicated(df.subset$comp), ] #remove duplicates counts <- df.subset[c(4:13)] #take only counts

Code from the author, edited to our files.

Examine relationship between FC and Tx length for biological conditions after CQN normalization with GC and Length

cqnres <- cqn(counts = counts,x = df.subset$GC,lengths = df.subset$length) # cqn normalization
CQNnorm <- cqnres$y + cqnres$offset # values are in log2 cqnplot(cqn_res, n = 2) #See how the systematic effect(length or GC) influences the LFC.the longer the gene, or higher the GC, the higher the LFC.

make the CQN normalization appropriate for DESeq2

cqnOffset <- cqnres$glm.offset cqnNormFactors <- exp(cqnOffset) CQNnormcomp <- cbind(cqnNormFactors,df.subset$comp)#annotate comp ID's to normalized count matrix colnames(CQNnormcomp)[colnames(CQNnorm_comp)==""] <- "comp" #change column label

Need to remove duplicates for DESEq

master <- read.csv("cavernosaMaster.csv") a <- merge(CQNnormcomp,master,by="comp") aev <- a[order(a$comp, abs(a$evalue) ), ] ### sort first aev <- a[ !duplicated(a$comp), ] ### Keep lowest evalue CQNnormcompdeseqinput <- aev[c(1:10)] #keep only CQN normalized data and contig ID CQNrawcount <- aev[c(1,15:24)] #keep raw counts that survived the CQN filtration

now for DESeq2

if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install() library(DESeq2) -------CQN integration-------- countData <- as.matrix(read.csv("CQNrawcount.csv", row.names="comp")) colData <- read.csv("Mcavlabel.csv")

set the design## Creates matrix by mergeing treatment to replicate id and it's HTseq read count

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~Treatment)

to integrate CQN Normalization:

normFactors <- cqnNormFactors / exp(rowMeans(log(cqnNormFactors))) normalizationFactors(dds) <- cqnNormFactors dds <- dds[ rowMeans(counts(dds)) > 10, ] #filter, #previously dds test <-DESeq(dds) #previously dds resWPvCD <- results(test,contrast=c("Treatment", "Disease", "Control")) head(resWPvCD) resorderdWPvCD <-resWPvCD[order(resWPvCD$padj),] head(resorderdWPvCD, 12) summary(resWPvCD) write.csv(resorderdWPvCD, file = "McavCQNWPvCD.csv")

ADD REPLY • link 5.4 years ago nicholas.macknight • 0

score 0 · Answer 1 · 2019-11-21

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

https://support.bioconductor.org/p/126615/#126617

ADD COMMENT • link 5.4 years ago Michael Love 43k