Question

Fdrtool for RNA seq analysis

0

Entering edit mode

Paul • 0

@paul-23237

Last seen 4.9 years ago

The Netherlands

I have RNA seq data an experiment consisting of an untreated and treated samples in triplicates; a total of 6 samples. The culturing of the used tissue can be challenging so a batch effect is expected. I am using Deseq2 and would like to apply apeglm shrinkage and use adjusted p-values as a cutoff to explain effects between conditions, but am encountering some problems. My questions are mainly related how to use the fdrtool properly. Perhaps the fdrtool is not necessary in this case, but regardless I would like to know how one would apply this while using apeglm shrinkage. I am aware of the fdrtool manual that exists but if I am correct, it does not explain these questions.

Thank you in advance for your time!!

My data shows a strong hill shaped histogram. I am already correcting for the batch effect as this was expected to take place. Making the count-cutoff more stringent doesn’t correct fully for this effect (see histograms and code below).

#count matrix
dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~batch + condition)

# Renaming levels
levels(dds$condition)
levels(dds$condition) <- c("trt", "untrt")
dds$condition <- relevel(dds$condition, ref = "untrt")
levels(dds$condition)

#prefiltering to remove low read genes
nrow(dds)
keep <- rowSums(counts(dds) >= 5) >= 3  #alternatively rowSums(counts(dds)) > 2
dds <- dds[keep,]
nrow(dds)

#-----------------------------------------------
#Differential expression analysis - results table
#-----------------------------------------------
Deseq_table <- DESeq(dds, betaPrior=FALSE)  #version 1.26.0

#shrink
resultsNames(Deseq_table)
Deseq_results <- results(Deseq_table, contrast=c("condition","trt","untrt"), alpha = 0.1)
Deseq_results <- lfcShrink(Deseq_table, coef="condition_trt_vs_untrt", type="apeglm", res=Deseq_results)

#plots
plotMA(Deseq_results, ylim=c(-2,2), cex=.8)
abline(h=c(-1,1), col="dodgerblue", lwd=2)

hist(Deseq_results$pvalue, col = "grey50", border = "white", main = "WT vs Deletion", xlab = "p-values")

Deseq apeglm without applying fdrtool

The way I interpret this, but please correct me if I am wrong, is that for this dataset the p-values are higher than expected leading to a potentially to a lower power. Since I am correcting for the batch effect and low counts, I am not sure what is causing this. A possible solution would be to recalculate the p-values using de fdrtool. However, this tool requires the column “stat” , which seems to be lost after the lfcshrink calculation using either apeglm or ashr. My question therefore is:

Q1: How can I still apply the fdrtool in a correct way? Or am I doing something wrong? I do not think the following is allowed but I can simply ‘save’ the information before doing shrinkage and then continue using the fdrtool. Like so:

#-----------------------------------------------
#Differential expression analysis - results table
#-----------------------------------------------
Deseq_table <- DESeq(dds, betaPrior=FALSE)

#shrink
resultsNames(Deseq_table)
Deseq_results <- results(Deseq_table, contrast=c("condition","trt","untrt"), alpha = 0.1)
Deseq_results_cor <- fdrtool(Deseq_results$stat, statistic= "normal", plot = T) #save stat values, v1.2.15
Deseq_results <- lfcShrink(Deseq_table, coef="condition_trt_vs_untrt", type="apeglm", res=Deseq_results)

#Correction  variance estimation p-values
#-----------------------------------------------
#remove genes filtered out by independent filtering and the dispersion outliers
Deseq_results <- Deseq_results[ !is.na(Deseq_results$padj), ]
Deseq_results <- Deseq_results[ !is.na(Deseq_results$pvalue), ]
Deseq_results <- Deseq_results[, -which(names(Deseq_results) == "padj")]
Deseq_results_cor$param[1, "sd"] #is 0.725 ;estimated null model variance

#add p-values to dataframe
Deseq_results[,"padj"]  <- p.adjust(Deseq_results_cor$pval, method = "BH")

#plots
plotMA(Deseq_results, ylim=c(-2,2), cex=.8)
abline(h=c(-1,1), col="dodgerblue", lwd=2)

hist(Deseq_results_cor$pval, col = "grey50", border = "white", main = "WT vs Deletion", xlab = "p-values")

Deseq apeglm and applying fdrtool

Q2: In case this way of calculating is allowed; how come that in both cases the resulting histogram looks like a reasonable p-value distribution but in the left plot >300 genes are significantly different in terms of expression and in the right plot only 21 genes.

Q3: This question might be related to Q2; am I always allowed to recalculate the adj. p-values by applying the fdrtool? From some of the comments on forums I understand there is not a method to justify this decision. Is this correct or am I missing something? How for instance, would you write in a methods section that it was decided to recalculate p-values?

fdrtool • 1.3k views

ADD COMMENT • link updated 4.9 years ago by Michael Love 43k • written 4.9 years ago by Paul • 0

0

Entering edit mode

Adding a link to the other thread

https://support.bioconductor.org/p/129549/#129550

ADD REPLY • link 4.8 years ago Michael Love 43k