Hi, I am trying to remove gene length bias, but after using CQN normalization, my correlation between gene length and significance is still ~0.9 and should be even closer to 0. Here is my code outline.
code from authors
cqnres <- cqn(counts = counts,x = df.subset$GC,lengths = df.subset$length) # cqn normalization
CQNnorm <- cqnres$y + cqnres$offset # values are in log2
cqnplot(cqn_res, n = 2) #See how the systematic effect(length or GC) influences the LFC.the longer the gene, or higher the GC, the higher the LFC.
make the CQN normalization appropriate for DESeq2
cqnOffset <- cqn_res$glm.offset cqnNormFactors <- exp(cqnOffset)
DESeq
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~Treatment)
to integrate CQN Normalization:
normFactors <- cqnNormFactors / exp(rowMeans(log(cqnNormFactors))) normalizationFactors(dds) <- normFactors
Any thoughts?