Hello how are you? I reopen this question because the following has happened:
I am doing a differential expression exercise using the hisat2, stringie & DESeq2 workflow. Finally I use the python prepDE.py script recommended in the StringTie manual to extract the counts.
So far so good, I have rows of genes and columns with cases (controls and patients) with number of counts. Now, when using Deseq2 when establishing the differential expression with nbinomWaldTest, I get results in p value with (NA). The question that I was reading forums why these boxes appear with NA values and they tell us that:
- If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p-value, and adjusted p-value will be set to NA.
- If a row contains a sample with an extreme count outlier, the p-value and the adjusted p-value will be set to NA. These outliers are detected by Cook's distance.
- If a row is filtered by independent automatic filtering, having a low mean normalized count, only the adjusted p-value will be set to NA.
It is suggested that as filters are deactivated as follows:
res <- results (dds, cooksCutoff = FALSE, independentFiltering = FALSE)
However, in doing so I still have boxes with NA, I really don't know what I'm doing wrong and I hope someone can help me.
I share the script that I have use.
library("DESeq2")
setwd("C:/Users/ADMIN/Desktop/tvt/")
expression_data <- read.table("C:/Users/ADMIN/Desktop/tvt/gene_count_matrixv2.csv", row.names = "gene_id", header = TRUE, sep = ";", stringsAsFactors = FALSE)
expression_data$X <- NULL
dim(expression_data)
summary(expression_data)
apply(expression_data, 2, sum)
mx = apply( expression_data, 1, max )
expression_data = expression_data[ mx > 227, ]
condition <-factor(c("control","control","paciente","paciente","paciente","paciente","paciente","paciente","paciente","paciente","paciente","paciente"),c("control","paciente"))
col_data = data.frame(condition)
dds = DESeqDataSetFromMatrix(expression_data, col_data, ~condition)
dds = estimateSizeFactors(dds)
dds = nbinomWaldTest(dds)
dds <- DESeq(dds, minReplicatesForReplace=Inf)
res <- results(dds, cooksCutoff=FALSE, independentFiltering =FALSE)
res = results(dds)
head(res)
res$padj = ifelse(is.na(res$padj), 0.1, res$padj)