I am analyzing some single cell human data that contains some featureBarcodes (10X's CITE-seq) for a few cell surface anti-bodies + 4 hashtags. Pipleline goes like this 10,000 cells into the 10X --> NovaSeq --> bcl2fastq --> cellranger count
When I import the raw seq data and the seq+featureBarcode data into SingleCellExperiment in R I get the same amount of "cells" 737,280 for both. BUT when I remove the emptydroplets on the seq data I get ~6000 remaining... a perfectly reasonable return for the experiment. BUT when I remove the emptydroplets on the featureBarcode+seq data I get 18,590 way more than expected...
with the edition of only 12 more features? (8 cell surface markers, 4 hashtags), what is making such a big difference?
as a solution do you think its fair to just subset the 12 features out, do drop outs without it, then put the data back in for the remaining "cells" to continue downstream analysis? or should I try to track down where the over 10,000 extra cells are coming from?
Thanks Aaron for taking the time to reply, glad to know this approach is the right way to go.
final off topic question for you: where do you go if you want a second opinion on an approach you are taking? is there a slack or discord group for sequencing work? IRC? or do you stick to the Bioconductor forums (which i guess are mirrored on the biostars or other way around)?
There's a Bioconductor slack group that you can sign up for here.