CyTOF Differential Abundance Analysis using DESeq2
1
0
Entering edit mode
@mikhaelmanurung-17423
Last seen 2.5 years ago
Netherlands

Dear all,

I would like to use DESeq2 to do differential abundance analysis of the cell clusters from my CyTOF data. Previously, I have tried diffcyt, which use edgeR. However, I often found significant results that were actually driven by only a few outlying samples. Therefore, I would like to try out how DESeq2 performs.

My question: is there anything in particular that I have to keep in mind if I want to use DESeq2 for CyTOF data? Can I just use the default order DESeq(), dds() and then lfcShrink()? As an example, when using edgeR, I can input the total number of cells per sample into the lib.size argument when constructing the DGEList. How can I do the same in DEseq2 (and also for other particulars)?

Thank you in advance.

Regards, Mikhael

deseq2 mass cytometry CyTOF • 1.3k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 days ago
United States

If you are using edgeR, I believe you could use estimateGLMRobustDisp [1] to minimize the effect of outlier samples, and can be followed by glmFit, glmLRT or the QL versions I believe.

If you are trying DESeq2, you can manually set the size factors, but I'm not sure exactly what you would set them to in this case for a direct comparison. The paper has:

Normalization for the total number of cells per sample (library sizes) is automatically performed by the edgeR functions.

And it looks like standard correction for library size is performed [2], so I'm not sure if you would need to modify any settings. The size factor estimation in DESeq2 will normalize based on the median ratio of each sample to a geometric mean sample, to the degree that this is an appropriate scaling for the cell counts then I think you could use this approach.

[1] https://pubmed.ncbi.nlm.nih.gov/24753412/ [2] https://github.com/lmweber/diffcyt/blob/master/R/testDA_edgeR.R#L174-L184

ADD COMMENT
0
Entering edit mode

Dear Michael,

Thank you for the fast reply! I have tried estimateGLMRobustDisp() as well as estimateDisp with robust=TRUE but the outlying cell clusters are still there. To add, the LFCs of these clusters are quite high, which further prompted me to look for an alternative.

Good to know that I am on the right track! EDIT: Would it be reasonable to assign the total number of cells per sample to sizeFactors(dds)?

Few more reasons why I would like to try DESeq2 are for the LFC shrinkage and s-values. So far, cell clusters that are statistically significant and have a high (shrunken) LFC looked convincing when I plotted the cell frequencies with boxplots, i.e. the significance is not driven by outliers.

ADD REPLY
0
Entering edit mode

I think to the extent that some cell clusters are present in roughly stable proportions across samples then median ratio based analysis will give reasonable results (and you can seed the median ratio with controlGenes). However, if composition is changing dramatically across samples and no such clusters are found, then I would consider multinomial modeling.

ADD REPLY
0
Entering edit mode

I am not sure about the extent of changes in composition across samples. However, there are indeed cases where clusters of cells are highly abundant only in one of the groups. That being said, is this current approach a good enough way to analyse CyTOF data? I can imagine edgeR/limma-voom, which are RNA-Seq tools being applied to CyTOF, would also encounter similar problems.

I am not familiar with multinomial modelling, so I guess it is time to consult with a statistician.

ADD REPLY
0
Entering edit mode

"is this current approach a good enough way to analyse CyTOF data" => I really don't have any experience in this domain, so can't give solid recommendations. I can tell that diffcyt was extensively evaluated in its publication, and is now widely used.

ADD REPLY

Login before adding your answer.

Traffic: 644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6