Question

HSMMSingleCell metadata

0

Entering edit mode

karl.stamm ▴ 10

@karlstamm-7254

Last seen 3.1 years ago

United States

A question on the included metadata for a particular experimentHub dataset.
HSMMSingleCell is the most popular RNASeq dataset in the Bioconductor ExperimentHub as of early 2022.

The vignette information talks about how the sequencing was performed and expression calculated, but there are several metadata columns left unexplained, perhaps too obvious.

Each sample has a Media annotation that does overlap with the timepoint, so I can guess what it means.

Each sample has a 'State' in 1,2,3 that sort of correlates with the other variables like timepoint. If I filter for a particular state I get clearer results, so I guess the samples are segregated on some phenotype. This feels important to interpreting results, because samples within one state are more homogeneous than samples within a timepoint.

There's a Pseudotime column that also basically correlates with timepoint, but no explanation of what it is.

Is there a publication for this dataset?

It's a few years old, but top ranked in rna-seq expression sets, so maybe others have used this dataset successfully.

ExperimentHubData ExpressionData CellDataSet RNASeqData • 980 views

ADD COMMENT • link 3.1 years ago karl.stamm ▴ 10

score 0 · Answer 1 · 2022-03-22

The package 'monocle' is marked as importing HSMMSingleCell, so that's a place to look for a publication that references this dataset.

Then I found the monocle publication does explain the HSMM metadata in greater detail and solves my question.

The ExperimentHub dataset is dependent on the Monocle publication, and we can then assume the "State" column in the phenoData slot is the State value of Supplementary Figure 4 of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4122333/

The Pseudotime is a critical component of the monocle package and doesn't make sense standalone, but is explained in their paper.

I see now that both State and Pseudotime are derived/inferred from the expression data by the monocle software. It's very neat and fascinating. But if you're making an independent statistical analysis you need to know that it's circular logic: obviously when segregated by "State" the samples will have clearer differential expression results, that's how State was computed already!