Hi all. I am trying to run a pseudobulk analysis assessing differential gene expression between control and mutant cells of different cell types. When we previously assigned our cell types, our preprocessing involved a decontamination step. Our samples are made from retinal cells, and we found that there were a ton of photoreceptor-specific genes contaminating all of our other cell types, which is why we did this decontamination step in the first place. However, I am now trying to run deseq2 to find DEGs for each cell type. When I use the regular count matrices, I get TONS of photoreceptor gene contamination in the DEG lists for non photoreceptor cell types, which makes the data less than useful. However, the decontXcounts matrix values are NOT integers, and R throws me the following error.
"Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are not integers"
I know that Deseq2 is supposed to use raw counts, but what if raw counts are biologically problematic and cannot be used? What is an alternative?
Thanks in advance!
Yes, ATpoint is correct. decontX decontaminated counts are not integers due to the way the variation inference algorithm works, but you can simply just round the counts before plugging into DEseq or other tools.
I do not know this decontamination tool you use but generally: If these counts are basically "corrected" raw counts in the sense that they're not normalized and on the same scale as the original data then you might simply round them to the next integer. There is previous questions on similar issues, for example using estimated counts from something like RSEM.