The documentation you've quoted mainly refers to the situation where you have samples that yield multiple libraries, and those libraries are multiplexed differently. For example, if you have an experiment where you generate a 5' library and an antibody library from each sample, swappedDrops
will assume that if you multiplexed - say - samples A, B and C together for the sequencing of the 5' libraries, you also multiplexed samples A, B and C together for the sequencing of the antibody libraries.
By comparison, an alternative multiplexing scheme (with 6 samples A-F) might be to:
- multiplex samples A, B and C together for the 5' libraries, while also multiplexing samples D, E and F separately
- multiplex A, C and E together for the antibody libraries, while also multiplexing B, D and F separately.
You can see that the multiplexing scheme is different for the 5' libraries and antibody libraries generated from the same sample. This will confuse swappedDrops
as it will try to remove swapped barcodes between libraries that were never sequenced together - for example, if you gave the function the molecule information files for A, B and C, it would work correctly for the endogenous genes but not for the antibody tags where B was never multiplexed with A and C. To fix this, you would have to run swappedDrops
again for the multiplexing scheme used for the antibody tags, and then combine the results with the first call to swappedDrops
.
Your case describes something different in that you should never see barcode swapping between 3' and 5' libraries. Well, it might happen, but I would be shocked if a 5' library could be processed by CellRanger as a 3' library and still yield any meaningful counts (and vice versa). I'd imagine that any swapped reads would just be discarded as they don't have the right adaptors/barcodes/whatever. So there shouldn't be any need to run swappedDrops
on everything that was sequenced together if their processing is mutually exclusive.
In theory, the case of samples with and without antibody tags is the same; you shouldn't have to worry about swapping of the antibody tags into samples without tags, if you processed the latter in CellRanger without knowledge of those tags. In practice, swappedDrops
will complain if the genes in the molecule information file are not the same - and rightly so, otherwise people could silently stuff up. (I guess I could try to do some more matching if one gene set is a subset of another.) This means that if you processed some samples without tags, and others with tags, then you wouldn't be able to run swappedDrops
on the whole lot. The solution here is to just run all multiplexed samples with the same CellRanger prior to running swappedDrops
. This is probably fine - samples without antibody tags will just get near-zero counts for those tags, most are liable to be removed anyway after swappedDrops
, and if not you can just ignore them for downstream analyses.
The real suffering begins if you managed to multiplex samples from mouse and human together. Trying to figure out the swapping between homologous genes is not something I would look forward to.
Are the UMI and cell barcodes are the same between the 5' library and 3' library, and they are stored in the same way in the molecule_info file? Or did you institute some sort of 10X version check? This will have obvious implications for whether
swappedDrops
is easily able to detect that you are trying to unswap silly things.Dear both,
Thank you very much for your quick reply and for the detailed explanations! So, as long as we avoid an intermixing of samples for the multiplexing (between libraries, as described in your alternative scheme with 6 mixed samples), swappedDrops should be able to correctly compare the samples, regardless of the overall presence of different libraries?
As for the 3’/5’ - situation: UMI and cell barcodes are exactly the same between 3’ and 5’ libraries and they appear to be stored in exactly the same way in the molecule_info file. We would assume that even if cellranger discards reads that are found together with a “wrong” library (arising from index hopping occurring between 5’ and 3’ samples) still index hopping between samples of the same type (3’ to 3’, 5’ to 5’) needs to be dealt with, because cellranger would keep these reads.
Do we take it correctly that in this case we have to run swappedDrops independently on the 3’ samples and on the 5’ samples and then merge the results? In any case, running swappedDrops on all samples simultaneously would simply lead to being too conservative, correct? Or are we overlooking another error?
Thank you and Best!
The vagueness of English fails us. You'll have to be more precise about your setup for me to know whether your statement is correct. I'm going to assume that:
Now, the simplest case in which
swappedDrops
will work is:The function won't work (completely) correctly if:
It is straightforward to see why. If I include X in
swappedDrops()
, the swapping in the 5' libraries will be removed, but there couldn't have been any swapping in the antibody libraries, so anything that gets removed there would have been incorrectly removed. (Or hell, you might not even have any antibody library for X, so you wouldn't even be able to runswappedDrops()
in the first place because the feature sets don't match between samples.) Conversely, if I didn't include X inswappedDrops()
, any swapping from X to A-F or vice versa would not be correctly removed.It's not just whether the barcodes/UMIs are the same. It's the entire read sequence and how it's handled by CellRanger. I'd be surprised if the entire 5' and 3' constructs were similar enough that they could be processed interchangeably. There is an easy test; just run the
swappedDrops()
on the 5'/3' libraries that have been multiplexed together, measure the percentage of swapped molecules across different technologies, and compare that to the percentage of swapped molecules within technologies. This should tell you if swapping is occurring between 5' and 3' libraries.Yes, assuming you don't get swapping between libraries processed with 3' and 5' technologies.
Yes, though it may not actually have much of an effect in terms of molecule removal, so you could just do it anyway. You probably won't miss a 0.2% of molecules that get incorrectly thrown out this way.
Thank you very much for your answers and for detailing the experiment setup once more. We do not intend to mix extra samples into the multiplexing, but will always chose a setup as you describe here:
Thank you again for clarifying!
Regarding the 3’/5’ index hopping. We did a test as you suggested, and the following came out:
https://ibb.co/54tjpgB (please let us know if the image is not visible to you)
So, indeed no swapping is observed between libraries (please not that we had many more 3’ reads than 5’ reads in our setup, which explains the larger proportion of index hopping compared to the 5’ libraries). For us, this confirms that cellranger discards those molecules that swapped between libraries (since we work on the cellranger molecule file). So, we probably loose more molecules than necessary, because cellranger simply discards without trying to reassign the molecule to the correct sample.
Thank you again for all your help.