Hi,
I am trying to follow analysis steps as outlined in http://bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html
I run into an issue when running
summaryDt <- summaryCond$datatable
fcHurdle <- merge(summaryDt[contrast=='conditionStim' & component=='H',.(primerid, `Pr(>Chisq)`)], #hurdle P values summaryDt[contrast=='conditionStim' & component=='logFC', .(primerid, coef, ci.hi, ci.lo)], by='primerid') #logFC coefficients
My summaryCond$datatable only has 'C', 'D', 'S' and 'logFC' components but not 'H'. Because of this issue, I cannot perform subsequent MAST analysis steps. Not sure what I am doing incorrectly. I am comparing single cell RNA-seq data from two groups of cells.
Any help would be appreciated!
>sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] MAST_1.4.1 SummarizedExperiment_1.8.1 DelayedArray_0.4.1 matrixStats_0.53.1
[5] RColorBrewer_1.1-2 rsvd_0.9 NMF_0.20.6 cluster_2.0.6
[9] rngtools_1.2.4 pkgmaker_0.22 registry_0.5 stringr_1.3.0
[13] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.30.3 GenomicRanges_1.30.2 GenomeInfoDb_1.14.0
[17] knitr_1.20 data.table_1.10.4-3 reshape2_1.4.3 limma_3.34.8
[21] GSEABase_1.40.1 graph_1.56.0 annotate_1.56.1 XML_3.98-1.10
[25] AnnotationDbi_1.40.0 IRanges_2.12.0 S4Vectors_0.16.0 Biobase_2.38.0
[29] BiocGenerics_0.24.0 GGally_1.3.2 ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] httr_1.3.1 RMySQL_0.10.13 bit64_0.9-7 foreach_1.4.4 assertthat_0.2.0 blob_1.1.0
[7] GenomeInfoDbData_1.0.0 Rsamtools_1.30.0 progress_1.1.2 pillar_1.1.0 RSQLite_2.0 lattice_0.20-35
[13] digest_0.6.15 XVector_0.18.0 colorspace_1.3-2 Matrix_1.2-12 plyr_1.8.4 pkgconfig_2.0.1
[19] biomaRt_2.34.2 zlibbioc_1.24.0 xtable_1.8-2 scales_0.5.0 BiocParallel_1.12.0 tibble_1.4.2
[25] lazyeval_0.2.1 magrittr_1.5 memoise_1.1.0 doParallel_1.0.11 tools_3.4.1 prettyunits_1.0.2
[31] gridBase_0.4-7 munsell_0.4.3 Biostrings_2.46.0 compiler_3.4.1 rlang_0.2.0 grid_3.4.1
[37] RCurl_1.95-4.10 iterators_1.0.9 bitops_1.0-6 gtable_0.2.0 codetools_0.2-15 abind_1.4-5
[43] DBI_0.7 reshape_0.8.7 R6_2.2.2 GenomicAlignments_1.14.1 rtracklayer_1.38.3 bit_1.1-12
[49] stringi_1.1.6 Rcpp_0.12.15
Are you directly cutting and pasting code from vignette and seeing this issue?
Correct. The interesting things is that MAST correctly identifies differentially expressed genes but does not give me the Hurdle p value.
Joe
I am unable to replicate. The vignette compiles fine and
summaryCond$datatable
contains all the expected components:Are you trying to adapt the code in the vignette to your own data and it's not working?
Yes. Please see below. My starting data is 'normmatrix', a numeric matrix with genes as rows and cells as columns.
>suppressPackageStartupMessages({
library(ggplot2)
library(GGally)
library(GSEABase)
library(limma)
library(reshape2)
library(data.table)
library(knitr)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(stringr)
library(NMF)
library(rsvd)
library(RColorBrewer)
library(MAST)
})
>options(stringsAsFactors = F)
>genes <- as.data.frame(as.character(row.names(normmatrix)), stringsAsFactors = F)
>colnames(genes) <- "gene"
>rownames(genes) <- genes$gene
>colnames(cellidentities) <- "celltype"
>cellidentities$celltype <- as.character(cellidentities$celltype)
>sle <- normmatrix[,row.names(subset(cellidentities, cellidentities$celltype %in% c("Th_conv", "Plasma_cells")))]
>cellsubset <- subset(cellidentities, cellidentities$celltype %in% c("Th_conv", "Plasma_cells"))
>scaRaw <- FromMatrix(as.matrix(sle), cellsubset, subset(genes, genes$gene %in% row.names(sle)))
>cdr2 <-colSums(assay(scaRaw)>0)
>colData(scaRaw)$cngeneson <- scale(cdr2)
>cond<-factor(colData(scaRaw)$celltype)
>cond<-relevel(cond,"Th_conv")
>colData(scaRaw)$condition<-cond
>zlmCond <- zlm(~condition+cngeneson, scaRaw)
>summaryCond <- summary(zlmCond, doRLT = 'conditionPlasma_cells', logFC = TRUE, level = 0.95)
>summaryDt <- summaryCond$datatable
What is the output of
summaryCond <- summary(zlmCond, doRLT = TRUE, logFC = TRUE, level = 0.95) table(summaryCond$datatable$component, summaryCond$datatable$contrast, exclude = NULL)
Please see below.
> table(summaryCond$datatable$component, summaryCond$datatable$contrast, exclude = NULL)
(Intercept) cngeneson conditionPlasma_cells
C 20922 20922 20922
D 20922 20922 20922
logFC 0 20922 20922
S 20922 20922 20922