How to filter cells for scRNA-seq analysis based on expression level of surface marker?
1
0
Entering edit mode
@Iris Wing To-24589
Last seen 3.8 years ago

Hi everyone,

I am completely new to scRNA-seq data analysis and I am trying to figure out how to include only cells that fulfill a certain criterion (in my case only activated T cells, from other cells, based on the expression of an activation marker)

I figured that I would have to set a threshold for the expression level of that activation marker in order to sort out the activated cells for the analysis. The question is how should I set that threshold?

Thanks for your help!

Cheers, Iris

scRNAseq • 2.3k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

You have two options.

The first is to go through the basic workflow (see the quick start here) and try to identify a cluster of cells where your activation marker is upregulated. This doesn't require any _a priori_ knowledge other than the identity of a single gene; even if that gene isn't well captured in the scRNA-seq data, you can use similarities across the rest of the transcriptome to help identify the relevant group of cells. However, it can be rather arbitrary/subjective if your population is not obvious, as the clustering depends greatly on the parameters. More importantly, this approach can only define populations in a relative sense. You might be able to identify a subpopulation that is "more activated" than the others (i.e., expresses more of the activation marker), but whether that is equivalent to an actual activated population is left to your interpretation.

The second approach is to find an existing expression dataset (bulk or single-cell) that already contains a set of activated T cells, amongst other cell types. You can then use this reference profile to determine the similarity of your dataset to the reference, as discussed here. The general idea is to see if there exists a population in your dataset that is more like activated T cells than any other (relevant) cell type. This approach is _relatively_ objective and identifies the cell type in an absolute sense, though it depends on the availability of a reasonably trustworthy and comprehensive reference. For example, you'll want to make sure that your reference includes both activated and unactivated T cells, otherwise the classifier wouldn't be able to distinguish things based on activation status. Similar thoughts apply to marker lists, if you can collect a decent (>50) number of genes defining activation status.

ADD COMMENT
0
Entering edit mode

Super clear! Thanks a lot Aaron :)

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6