In which order should you detect empty droplets / remove barcode swapping using DropletUtils?
1
0
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 20 months ago
Cumbernauld

In which order should you detect empty droplets / remove barcode swapping? The DropletUtils vignette says you should use all barcodes for empty droplet detection so I assume this is the first step? Naively I thought that if the amount of barcode swapping was large then the counts for each barcode could be very different after the correction which would also affect empty droplet detection?

Specifically, I have 4 samples from 10x scRNA-seq which have all been sequenced together on the same lane of the Illumina 4000. My current workflow is to do the following:

  1. Detect empty droplets for each sample independently using the raw barcode matrix files (do not filter cells afterward)
  2. Detect barcode swapping amongst all samples using the molecule information files (the function returns a filtered matrix where column sums are not zero)
  3. Assign the counts from barcode swapping to the raw barcode matrix files

Would this be reasonable?

dropletutils • 1.1k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 30 minutes ago
The city by the bay

The "all barcodes" part of the documentation is just telling you to not filter on the cell barcodes, i.e., don't call cells with some other method before using emptyDrops. It doesn't mean you have to keep all reads for a given barcode.

The correct approach is to treat the barcode swapping removal step as part of the pre-processing to get the count matrix in the first place. You should do this before any cell calling - because barcode swapping occurs regardless of whether or not you have cells! - and then use the de-swapped matrix for all downstream analysis.

Your current approach puts unnecessary pressure on emptyDrops, which would find it harder to make the right calls if you have a lot of swapping between samples. If you clean up the count matrix with swappedDrops first, the estimate of - well, everything - should be more accurate and improve all downstream analyses.

ADD COMMENT
0
Entering edit mode

Okay, that's cleared it up for me! I was getting confused because swappedDrops was returning a matrix with less columns (i.e. barcodes) and thought this would interfere with the "all barcodes" bit written in the vignette for emptyDrops. I was then substituting the columns from the cleaned matrix back into the raw count matrix before using emptyDrops (I knew at that point I was probably doing something wrong). I'll just run swappedDrops and use the cleaned matrix output from the function as the raw count matrix. Many thanks, Aaron!

ADD REPLY

Login before adding your answer.

Traffic: 639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6