Have some questions about use of DESeq2 to process a large TnSeq dataset with the Mariner transposon. The dataset has 5 cohorts: starting library and 4 in vivo conditions; 6 experimental replicates/cohort. We confirmed one insertion/bacterial cell. Read depth is 10-15M insertions/sample which are mapped to generate insertion counts per ORF; range of 0 to >30,000 insertions per gene, per sample.
For downstream analytic reasons TRADIS does not work for needs. Transit with the ZINB or zbar functions might (are looking into it). For initial exploratory analyses we ran through DESeq2 to at least handle the read normalization and 4 pairwise comparisons of interest (in vitro vs each in vivo). In fact DESeq2 performed better than Transit with Mann-Whitney hypothesis testing to identify internal control effects for specific genes. Question is whether DESeq2 is in fact appropriate for the following:
1. Read count normalization across samples
2. Generation of Log2 fold-change per comparisons between 2 conditions (seems straight-forward)
3. p value calculations - any issues for cases where one condition goes to 0 counts, which happens in 10-15% of the in vivo samples. Use of Mann-Whitney log rank or other non-parametric test loses a lot of signal with a q_value <=0.05, and (not surprisingly) all signal after multi-hypothesis corrections.
4. Calculation of the DESeq2 adjusted p value (p_adj) - same as question #3, is the DESeq2 model appropriate for TnSeq data. We've noted that our RNAseq datasets often have comparisons where one condition goes to 0 reads, and have read that the DESeq2 model can handle these conditions, though in the TnSeq dataset the number of genes where such occurs is 2-5X that of our RNAseq datasets. A second issue that there are cases where DESeq2 calculated a p_value but not the p_adj - commonly in cases where one condition goes to 0 counts. A standard Benjamini-Hochberg correction on all DESeq2 p_values, including ones where p_adj wasn't calculated, support a multi-hypothesis corrected value. However, concerned that there may be issues with the DESeq2 p_value if its model is not appropriate in this case.
As I noted, we're evaluating other options, but the DESeq2 package is streamlined and, so far, performing the best to call internal + and - internal control genes baked into the experimental design.
Thanks in advance for any insights!