Question

Recommendations for running DropletUtils::swappedDrops() on multiplexed samples from multiple libraries

0

Entering edit mode

rosano • 0

@rosano-20689

Last seen 5.4 years ago

Hi all,

Regarding the following passage from the description of swappedDrops() from package DropletUtils:

“In files produced by CellRanger version 3.0, an additional per-molecule field is present indicating the (c)DNA library from which the molecule was derived. Library preparation can be performed separately for different features (e.g., antibodies, CRISPR tags) such that one 10X run can contain data from multiple libraries. This allows for arbitrarily complicated multiplexing schemes - for example, gene expression libraries might be multiplexed together across one set of samples, while the antibody-derived libraries might be multiplexed across another different set of samples. For simplicity, we assume that multiplexing was performed across the same set of samples for all libraries therein.”

Do I understand correctly that if a given set of multiplexed samples were processed with multiple libraries, swappedDrops() assumes that the same samples were multiplexed for all libraries? Or do you forbid any different libraries to be multiplexed? To give an example: would a mix of 14 multiplexed samples, 8 of them with 3’ libraries and 6 of them with 5’ libraries, be ok? And would it be ok to mix samples with gene expression libraries (e.g. 5’) and antibody labelling?

Thank you!

dropletutils swappeddrops • 1.6k views

ADD COMMENT • link updated 6.0 years ago by Aaron Lun ★ 28k • written 6.0 years ago by rosano • 0

score 2 · Answer 1 · 2019-05-03

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 13 hours ago

The city by the bay

The documentation you've quoted mainly refers to the situation where you have samples that yield multiple libraries, and those libraries are multiplexed differently. For example, if you have an experiment where you generate a 5' library and an antibody library from each sample, swappedDrops will assume that if you multiplexed - say - samples A, B and C together for the sequencing of the 5' libraries, you also multiplexed samples A, B and C together for the sequencing of the antibody libraries.

By comparison, an alternative multiplexing scheme (with 6 samples A-F) might be to:

multiplex samples A, B and C together for the 5' libraries, while also multiplexing samples D, E and F separately
multiplex A, C and E together for the antibody libraries, while also multiplexing B, D and F separately.

You can see that the multiplexing scheme is different for the 5' libraries and antibody libraries generated from the same sample. This will confuse swappedDrops as it will try to remove swapped barcodes between libraries that were never sequenced together - for example, if you gave the function the molecule information files for A, B and C, it would work correctly for the endogenous genes but not for the antibody tags where B was never multiplexed with A and C. To fix this, you would have to run swappedDrops again for the multiplexing scheme used for the antibody tags, and then combine the results with the first call to swappedDrops.

Your case describes something different in that you should never see barcode swapping between 3' and 5' libraries. Well, it might happen, but I would be shocked if a 5' library could be processed by CellRanger as a 3' library and still yield any meaningful counts (and vice versa). I'd imagine that any swapped reads would just be discarded as they don't have the right adaptors/barcodes/whatever. So there shouldn't be any need to run swappedDrops on everything that was sequenced together if their processing is mutually exclusive.

In theory, the case of samples with and without antibody tags is the same; you shouldn't have to worry about swapping of the antibody tags into samples without tags, if you processed the latter in CellRanger without knowledge of those tags. In practice, swappedDrops will complain if the genes in the molecule information file are not the same - and rightly so, otherwise people could silently stuff up. (I guess I could try to do some more matching if one gene set is a subset of another.) This means that if you processed some samples without tags, and others with tags, then you wouldn't be able to run swappedDrops on the whole lot. The solution here is to just run all multiplexed samples with the same CellRanger prior to running swappedDrops. This is probably fine - samples without antibody tags will just get near-zero counts for those tags, most are liable to be removed anyway after swappedDrops, and if not you can just ignore them for downstream analyses.

The real suffering begins if you managed to multiplex samples from mouse and human together. Trying to figure out the swapping between homologous genes is not something I would look forward to.

ADD COMMENT • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Are the UMI and cell barcodes are the same between the 5' library and 3' library, and they are stored in the same way in the molecule_info file? Or did you institute some sort of 10X version check? This will have obvious implications for whether swappedDrops is easily able to detect that you are trying to unswap silly things.

ADD REPLY • link 6.0 years ago Jonathan Griffiths ▴ 90

0

Entering edit mode

Dear both,

Thank you very much for your quick reply and for the detailed explanations! So, as long as we avoid an intermixing of samples for the multiplexing (between libraries, as described in your alternative scheme with 6 mixed samples), swappedDrops should be able to correctly compare the samples, regardless of the overall presence of different libraries?

As for the 3’/5’ - situation: UMI and cell barcodes are exactly the same between 3’ and 5’ libraries and they appear to be stored in exactly the same way in the molecule_info file. We would assume that even if cellranger discards reads that are found together with a “wrong” library (arising from index hopping occurring between 5’ and 3’ samples) still index hopping between samples of the same type (3’ to 3’, 5’ to 5’) needs to be dealt with, because cellranger would keep these reads.

Do we take it correctly that in this case we have to run swappedDrops independently on the 3’ samples and on the 5’ samples and then merge the results? In any case, running swappedDrops on all samples simultaneously would simply lead to being too conservative, correct? Or are we overlooking another error?

Thank you and Best!

ADD REPLY • link 6.0 years ago rosano • 0

0

Entering edit mode

So, as long as we avoid an intermixing of samples for the multiplexing (between libraries, as described in your alternative scheme with 6 mixed samples), swappedDrops should be able to correctly compare the samples, regardless of the overall presence of different libraries?

The vagueness of English fails us. You'll have to be more precise about your setup for me to know whether your statement is correct. I'm going to assume that:

You have 6 samples, A-F.
Each sample generates two libraries; one for the 5' RNA-seq, and one for Ab tags.

Now, the simplest case in which swappedDrops will work is:

5' libraries for samples A-F are multiplexed together with no other 5' libraries.
Antibody libraries for samples A-F are multiplexed together with no other antibody libraries.

The function won't work (completely) correctly if:

5' libraries for samples A-F and an extra sample X were multiplexed together.
Antibody libraries for samples A-F were multiplexed together... but without X.

It is straightforward to see why. If I include X in swappedDrops(), the swapping in the 5' libraries will be removed, but there couldn't have been any swapping in the antibody libraries, so anything that gets removed there would have been incorrectly removed. (Or hell, you might not even have any antibody library for X, so you wouldn't even be able to run swappedDrops() in the first place because the feature sets don't match between samples.) Conversely, if I didn't include X in swappedDrops(), any swapping from X to A-F or vice versa would not be correctly removed.

As for the 3’/5’ - situation: UMI and cell barcodes are exactly the same between 3’ and 5’ libraries and they appear to be stored in exactly the same way in the molecule_info file

It's not just whether the barcodes/UMIs are the same. It's the entire read sequence and how it's handled by CellRanger. I'd be surprised if the entire 5' and 3' constructs were similar enough that they could be processed interchangeably. There is an easy test; just run the swappedDrops() on the 5'/3' libraries that have been multiplexed together, measure the percentage of swapped molecules across different technologies, and compare that to the percentage of swapped molecules within technologies. This should tell you if swapping is occurring between 5' and 3' libraries.

Do we take it correctly that in this case we have to run swappedDrops independently on the 3’ samples and on the 5’ samples and then merge the results?

Yes, assuming you don't get swapping between libraries processed with 3' and 5' technologies.

In any case, running swappedDrops on all samples simultaneously would simply lead to being too conservative, correct?

Yes, though it may not actually have much of an effect in terms of molecule removal, so you could just do it anyway. You probably won't miss a 0.2% of molecules that get incorrectly thrown out this way.

ADD REPLY • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you very much for your answers and for detailing the experiment setup once more. We do not intend to mix extra samples into the multiplexing, but will always chose a setup as you describe here:

Now, the simplest case in which swappedDrops will work is: - 5' libraries for samples A-F are multiplexed together with no other 5' libraries. - Antibody libraries for samples A-F are multiplexed together with no other antibody libraries.

Thank you again for clarifying!

Regarding the 3’/5’ index hopping. We did a test as you suggested, and the following came out:

https://ibb.co/54tjpgB (please let us know if the image is not visible to you)

So, indeed no swapping is observed between libraries (please not that we had many more 3’ reads than 5’ reads in our setup, which explains the larger proportion of index hopping compared to the 5’ libraries). For us, this confirms that cellranger discards those molecules that swapped between libraries (since we work on the cellranger molecule file). So, we probably loose more molecules than necessary, because cellranger simply discards without trying to reassign the molecule to the correct sample.

Thank you again for all your help.

ADD REPLY • link 6.0 years ago rosano • 0