Question

marker genes in single cell experiment

0

Entering edit mode

lirongrossmann ▴ 50

@lirongrossmann-23954

Last seen 4.2 years ago

Hi,

I am trying to create a reference dataset from 40 different cell types (fpkm_matrix is a 26,000 x 40 log count matrix) and I am trting to find gene markers for each cell line.

I used the following code:

cell.matrix<- SingleCellExperiment(list(logcounts = as.matrix(fpkm_matrix)))
colLabels(cell.matrix) <- colnames(fpkm_matrix)
out <- pairwiseTTests(cell.matrix, cell.matrix$label , direction="up")

and got the following error

Error in .compute_mean_var(x, BPPARAM = BPPARAM, subset.row = subset.row,  : 
  no residual d.f. in any level of 'block' for variance estimation

Based on that, I am suspecting there may not be a lot of difference between the cells types but I know that there is.

Any input would be appreciated.

Thanks, Liron

single cell singlecellexperiment gene markers • 2.0k views

ADD COMMENT • link updated 4.7 years ago by Aaron Lun ★ 28k • written 4.7 years ago by lirongrossmann ▴ 50

score 0 · Answer 1 · 2020-08-27

0

Entering edit mode

Peter Langfelder ★ 3.0k

@peter-langfelder-4469

Last seen 6 months ago

United States

My guess is that your cell.matrix contains no 'label' component since there's no 'label', only the expression, when you created it. In other words, cell.matrix$label is NULL. Maybe you have another variable (list) that contains the labels (component 'label') - you would want to use that instead of cell.matrix$label.

ADD COMMENT • link 4.7 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thanks, Peter. I forgot to add the code line with the labels in my question (I did it in my original code). I added it to the question, but I still get the above error....

ADD REPLY • link 4.7 years ago lirongrossmann ▴ 50

score 0 · Answer 2 · 2020-08-27

I am going to guess that each column name is unique, in which case there are no replicates for any of the labels; in this case, computing a p-value for differential comparisons is not possible. This is reflected in the error message, where it's telling you that there are no residual degrees of freedom for the t-test.

Check if the labels are something like CD4_rep1, CD4_rep2, etc. in which case you can just sub() out the _repX to get consistent labels for the same cell type. However, if you actually only have one column per cell type, you're stuffed. There's no way to compute p-values here. Perhaps use SingleR::getClassicMarkers() instead to get the top markers with the largest log-fold changes.

For either function, one would typically use the log-transformed values.