Hi all,
To give you some context:
We have an experimental setting with 2 samples, where GEX (gene expression) and ADT (antibody labelling) libraries are generated from the same sample. So, in total, 4 samples are generated corresponding to 2 pairs of paired GEX-ADT samples.
Specifically following the directions from previous thread https://support.bioconductor.org/p/120645/, all 4 samples were multiplexed together in the same sequencing run, in order to be able to correct for index hopping afterwards. Cellranger processing had to be done separately for GEX and ADT samples, as the special “Feature Barcoding Analysis” was required to be able to process the latter.
However, when we proceeded to run swappedDrops() on the resulting molecule information files (4 in total, 1 per sample), this was not possible due to error “gene information differs between samples”. And indeed, the resulting set of features is different for GEX and ADT, namely genes and antibody tags, respectively. Nonetheless, this came as a surprise for us, as the answers and discussions from the mentioned thread mislead us to believe that performing index hopping removal via swappedDrops() was possible for different libraries, just as long as they were multiplexed together.
So our questions would be:
- Is this not the case? Can't we use swappedDrops() at the same time on different libraries that were multiplexed together?
- If not, how is swappedDrops() supposed to be used when one has an experimental design such us ours? The specific setting was chosen to begin with in order to comply with swappedDrops() requirements..
Thank you in advance.
Thank you for the fast reply. Regarding the choice of analysis setting: since the last CellRanger update it is now possible to process ADT libraries independently from GEX, and this is what we do in our pipeline since it shows higher quality in results when comparing to the combined GEX and ADT analysis. (It is expected that more cells can be called if each library is run individually , see bottom table at https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/no-gex-analysis ).
Since we apparently misunderstood your answers from the previous post, we would now like to confirm with you the correctness of the following statement: regardless of the choice of experiment and analysis setting, it will never be possible to perform index hopping removal between GEX and ADT samples, because they contain a different sets of features (i.e. gene expression and antibody tags). So only a within-library correction can be achieved with swappedDrops().
As a follow-up question from this particular analysis: your suggestion for an alternative index hopping removal setting (independently for GEX and ADT samples) is exactly what we did when faced with this problem. However, the results looked strange to us, because for one sample after index-hopping removal a very large cell number was indicated (a cell number that is not possible to obtain given the targeted cells, plus the other samples had cell numbers in the expected range, which suggests that index hopping can be the source of the unusual high number).
As I mentioned, we have a total of 4 samples, 2 GEX and 2 ADT, corresponding to 2 pairs of paired GEX-ADT samples (ie. S1 GEX, S1 ADT, S2 GEX and S2 ADT). We ran swappedDrops() separately, once on the S1 and S2 GEX molecule files and once on the S1 and S2 ADT files. We were expecting to detect around 12000 cells for both S1 libraries, and 9000 cells for both S2 libraries. However, after swapping correction, for the S1 ADT, the number of detected cells with emptyDrops() was totally off the charts, namely 18550 observed cells instead of the expected 12000 (while the results were as expected for the other 3 of the 4 samples). Do you have any ideas of what might be causing this problem? Bear in mind that the coverage of GEX samples was 1 order of magnitude higher than that of the ADT samples: around 3-4k mean reads per cell for ADT compared to 45k mean reads per cell for GEX.
Please, let us know should you need further details or additional information.
Well, it's up to you, but this seems like a cell calling problem rather than a UMI counting problem. You know, you can always just run CellRanger once and re-do the cell calling afterwards on the raw count matrix.
That's correct. Which is not to say that there are no swapped molecules - for example, a swapped transcript molecule might generate reads that get incorrectly aligned to the ADT sequence - but we wouldn't be able to remove those anyway.
I doubt
swappedDrops()
is doing anything wrong here, it's too simple an algorithm to stuff up. The only remote possibility is that your sequencing is so deep that there is a high probability that a non-swapped molecule will have the same UMI-ADT-cell barcode combination across different samples. I'd find this hard to believe, but you could check by runningswappedDrops()
on two ADT libraries that were not multiplexed together.It's within 2-fold of the expected number of cells, that looks pretty good to me. Maybe you just miscounted? Or you have lots of broken cells and cell fragments. Technically speaking,
emptyDrops()
just tells you which droplets are non-empty, not those that contain intact cells.Honestly, I've never tried to run
emptyDrops()
on ADT data. I don't know if the defaults make sense. For example, how many barcodes have fewer thanlower
total counts when the sequencing is so deep? If there's not a lot, the method won't work. Depending on the shape of the barcode rank curve, the knee point detection is probably going to be broken, so you might consider overriding that (retain
).