In which order should you detect empty droplets / remove barcode swapping? The DropletUtils vignette says you should use all barcodes for empty droplet detection so I assume this is the first step? Naively I thought that if the amount of barcode swapping was large then the counts for each barcode could be very different after the correction which would also affect empty droplet detection?
Specifically, I have 4 samples from 10x scRNA-seq which have all been sequenced together on the same lane of the Illumina 4000. My current workflow is to do the following:
- Detect empty droplets for each sample independently using the raw barcode matrix files (do not filter cells afterward)
- Detect barcode swapping amongst all samples using the molecule information files (the function returns a filtered matrix where column sums are not zero)
- Assign the counts from barcode swapping to the raw barcode matrix files
Would this be reasonable?
Okay, that's cleared it up for me! I was getting confused because swappedDrops was returning a matrix with less columns (i.e. barcodes) and thought this would interfere with the "all barcodes" bit written in the vignette for emptyDrops. I was then substituting the columns from the cleaned matrix back into the raw count matrix before using emptyDrops (I knew at that point I was probably doing something wrong). I'll just run swappedDrops and use the cleaned matrix output from the function as the raw count matrix. Many thanks, Aaron!