Hi
I am trying to follow through this post as an exercise to find genes that are not differentially expressed. Few things in this post does not work [anymore]. Can somehow help ?
Specifically,
1. DESeqSummarizedExperimentFromMatrix. #does not work
#Instead I use the `DESeqDataSetFromMatrix` to load the data and I get this error.
#This is the warning I get.
Note: levels of factors in the design contain characters other than
letters, numbers, '_' and '.'. It is recommended (but not required) to use
only letters, numbers, and delimiters '_' or '.', as these are safe characters
for column names in R. [This is a message, not an warning or error]
More Importantly, the following command yeilds null.
betaSE <- mcols(rowData(dse))$SE_conditionuntreated
Have you copied all code from the old post verbatim (without *any* changes), or have you adapted it to your data? If the latter, please post the *full* code.
I tried to run the exact code ```{r} > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( + counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], + ~ type + condition ) Error in DESeqSummarizedExperimentFromMatrix(counts(pasillaGenes), pData(pasillaGenes)[, : could not find function "DESeqSummarizedExperimentFromMatrix" > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) Error in is(object, "DESeqDataSet") : object 'dse' not found > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange Error in is(object, "DESeqDataSet") : object 'dse' not found > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) Error in rowData(dse) : object 'dse' not found > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated Error in rowData(dse) : object 'dse' not found > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) Error in abs(beta) : non-numeric argument to mathematical function > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) Error: object 'pvalDE' not found > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) Error in pnorm(beta, thr, betaSE, lower.tail = TRUE) : object 'thr' not found > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) Error in pnorm(beta, -thr, betaSE, lower.tail = FALSE) : object 'thr' not found > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) Error in eval(quote(list(...)), env) : object 'pA' not found > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 Error in p.adjust(pvalDE, "BH") : object 'pvalDE' not found > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 Error in p.adjust(pvalTOST, "BH") : object 'pvalTOST' not found > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( + rowMeans( counts(dse,normalized=TRUE) ), beta, + log="x", pch=20, cex=.2, + col = 1 + sigDE + 2*sigSmall ) Error in counts(dse, normalized = TRUE) : object 'dse' not found > # Plot is attached. > ``` So I changed few things. 1. ```{r} > dse <- DESeqDataSetFromMatrix( + counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], + ~ type + condition ) Note: levels of factors in the design contain characters other than letters, numbers, '_' and '.'. It is recommended (but not required) to use only letters, numbers, and delimiters '_' or '.', as these are safe characters for column names in R. [This is a message, not an warning or error] ``` This runs okay, till I get to `betaSE <- mcols(rowData(dse))$conditionuntreated`
2. If I can replace that with
betaSE <- rowData(dse)$SE_condition_untreated_vs_treated` because
all.equal(mcols(dse), rowData(dse))
gives
>TRUE
> pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE )
> all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) [1] TRUE
3. I belive the `thr` is a typo, so theta <- .3 # So, we do our two one-sided tests. For a one-sided z test, we # simply use tail probabilities from the normal distribution. # First, the test of H0_A: true_beta > thr pA <- pnorm( beta, theta, betaSE, lower.tail=TRUE ) # Next, the test of H0_B: true_beta < -thr pB <- pnorm( beta, -theta, betaSE, lower.tail=FALSE ) # The overall p value is the maximum, because we want to reject H0_A # and H0_B simultaneously pvalTOST <- pmax( pA, pB ) # Let's adjust our two p values with BH: sigDE <- p.adjust( pvalDE, "BH" ) < .1 sigSmall <- p.adjust( pvalTOST, "BH" ) < .1
4. the number of genes with no difference
{r} > sum(!is.na(sigSmall)) [1] 11836 #However, sigSmallv <- p.adjust( pvalTOST, "BH" ) min(na.omit(sigSmallv) [1] 0.301
If the smallest value is 0.301 then, it would not give 11836 significantly equivalent genes. Am I understanding it wrong ?
I am trying to put the code that was run :) running into formatting issues