Hello bioconductor community,
I'm currently analyzing a published dataset of single-cell, which contains around 30 000 cell classified in 38 cell types, across 76 individuals. My objective is to get, for each celltype, differential expressed genes between "stim" and "control" individuals. When I run my analysis, I get only a small number of differential expressed genes.
Currently I'm trying to understand If I missed anything on my code, or I should change some parameter related to my analysis. I though about relaxing the threshold on the lowly expressed genes and also increase the treshold of the min number of cell per individual.
My preprocessing workflow consisted of:
- Filtering genes which were not expressed in at least 3 cells, and cells which did not have between 200 features and 5000 expressed features. Furthermore, cells with at least 10% of counts from MT genes were discarded
- I normalized and logged the count data (and scaled it by a factor of 100000).
- I further filtered the data to keep only lincRNA and protein coding genes, and then took a subset of top 20 000 genes based on variance.
For mast analysis this is the code I'm using (for each celltype)
#celltype_metdata -> Metadata for each celltype
#celltype_exp -> expression table for each celltype
celltype_metdata$wellKey <- celltype_metdata$CellBarcode_Identity
fData <- data.frame(primerid = row.names(lung_lognrom))
#Filter individuals with just one cell from that type
unique_cell_individuals <- names(which(table(celltype_metdata$Subject_Identity) > 1))
celltype_metdata <- celltype_metdata[(celltype_metdata$Subject_Identity %in% unique_cell_individuals), ]
celltype_metdata$Subject_Identity <- droplevels(celltype_metdata$Subject_Identity)
#Scale the age
celltype_metdata$AgeScaled = scale(as.numeric(celltype_metdata$Age),center=0)[,1]
#Load data on a sca object
sca <- FromMatrix(as.matrix(celltype_exp), celltype_metdata, fData)
sca <- filterLowExpressedGenes(sca, threshold=0.1) #Filter low expressed genes
colData(sca)$Subject_Identity <- droplevels(factor(colData(sca)$Subject_Identity))
colData(sca)$AgeScaled = as.numeric(colData(sca)$AgeScaled)
colData(sca)$Disease = droplevels(colData(sca)$Disease)
#Compute CDR (cellular detection rate, or the % of genes expressed in each cell)
colData(sca)$cngeneson <- as.numeric(scale(colSums(assay(sca)>0)))
#Fit model
zlmdata <- suppressMessages(zlm(formula(~ cngeneson+Sex+AgeScaled+Disease+condition+Manuscript_Identity+(1|Subject_Identity)), sca, method='glmer', ebayes = F, fitArgsD=list(nAGQ=0)))
mast <- suppressMessages(summary(zlmdata, doLRT="conditionStim", fitArgsD=list(nAGQ=0)))
summaryDt <- mast$datatabl
fcHurdle <- merge(summaryDt[contrast=="conditionStim" & component=='H',.(primerid, `Pr(>Chisq)`)],
summaryDt[contrast==""conditionStim" & component=='logFC', .(primerid, coef, ci.hi,ci.lo)], by='primerid')
fcHurdle[,fdr:=p.adjust(`Pr(>Chisq)`, 'fdr')]
sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dplyr_1.0.7 data.table_1.14.2
[3] purrr_0.3.4 MAST_1.20.0
[5] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[7] Biobase_2.54.0 GenomicRanges_1.46.1
[9] GenomeInfoDb_1.30.0 IRanges_2.28.0
[11] S4Vectors_0.32.3 BiocGenerics_0.40.0
[13] MatrixGenerics_1.7.0 matrixStats_0.61.0
loaded via a namespace (and not attached):
[1] progress_1.2.2 tidyselect_1.1.1 reshape2_1.4.4
[4] splines_4.1.2 lattice_0.20-45 colorspace_2.0-2
[7] vctrs_0.3.8 generics_0.1.1 utf8_1.2.2
[10] rlang_0.4.12 pillar_1.6.4 nloptr_1.2.2.3
[13] glue_1.6.0 DBI_1.1.2 GenomeInfoDbData_1.2.7
[16] lifecycle_1.0.1 plyr_1.8.6 stringr_1.4.0
[19] zlibbioc_1.40.0 munsell_0.5.0 gtable_0.3.0
[22] parallel_4.1.2 fansi_1.0.2 Rcpp_1.0.8
[25] scales_1.1.1 DelayedArray_0.20.0 XVector_0.34.0
[28] abind_1.4-5 lme4_1.1-27.1 hms_1.1.1
[31] ggplot2_3.3.5 stringi_1.7.6 grid_4.1.2
[34] cli_3.1.0 tools_4.1.2 bitops_1.0-7
[37] magrittr_2.0.1 RCurl_1.98-1.5 tibble_3.1.6
[40] crayon_1.4.2 pkgconfig_2.0.3 ellipsis_0.3.2
[43] MASS_7.3-54 Matrix_1.4-0 prettyunits_1.1.1
[46] assertthat_0.2.1 minqa_1.2.4 R6_2.5.1
[49] boot_1.3-28 nlme_3.1-152 compiler_4.1.2