Question

Which input data is acceptable for MAST?

0

Entering edit mode

james.watson1 • 0

@d1ce014b

Last seen 5 months ago

United States

I'd like to run MAST on a single-cell RNA-seq dataset I have of roughly 250,000 cells, but there is ambiguity regarding which data are appropriate to be used input. In the MAST paper, page 10, it mentions the input data are "log2(TPM + 1)." In this comment (https://github.com/RGLab/MAST/issues/147#issuecomment-770277174), Finak says "The input data SHOULD be log2 transformed but NOT scaled (i.e. Normalized). If you do not log2(x+1) transform the data you will have meaningless estimates of log fold change since the data are assumed to be on the log scale." There are multiple questions in the same thread about what other input data may be appropriate, but the only answers are about TPM, not raw RNA counts, as was asked.It seems like he's saying only log2-transformed TPM data, but I'd like to generate some confidence behind that notion or dispel it if I'm misunderstanding. We'd like to try with SCT v2 data, if possible. If I do need to be using log2-TPM data, is there a standard for converting raw RNA counts to TPM or CPM(if appropriate)? I've seen conflicting answers. Thank you.

SCTransform MAST • 583 views

ADD COMMENT • link updated 6 months ago by Andrew_McDavid ▴ 280 • written 6 months ago by james.watson1 • 0

score 0 · Answer 1 · 2024-10-03

0

Entering edit mode

Andrew_McDavid ▴ 280

@andrew_mcdavid-11488

Last seen 6 months ago

United States

Not too familiar with SCT v2. If it also has a point-mass at zero and the non-zero component is roughly-symmetric, then it might work ok with MAST. But why not just use (FindAllMarkers)[https://satijalab.org/seurat/reference/findallmarkers]? It uses the appropriately data layer and calls MAST directly.

ADD COMMENT • link 6 months ago Andrew_McDavid ▴ 280