Dear community,
I am currently analysing a set of time-course (~10 time points) stranded paired-end RNA-seq data with the particular objective of identifying time-dependent changes in alternative splicing.
However, I am still undecided whether alpine or Salmon or another method would be better suited for the estimation of transcript abundance. Up to this point, I used STAR to align the data to the reference genome for each individual time point. Is there an optimal work flow that you would recommend for read-depth normalization and isoform-specific transcript abundance estimation for time-course data?
Concerning read-depth normalisation, I have read the recommendation of "downsampling" reads in order to get comparable numbers for all samples, assuming that the number of reads is roughly equal across samples. However, the number of reads from my samples differs substantially (50-125 m reads) and I do not want to throw away so much data.
I would be very grateful to hear your thoughts on the matter.
Many thanks for your fast reply and your explanation. Since Salmon can either map reads itself as well as work with precomputed read alignments, I am unsure what the best approach would be - does quantification with Salmon work "better" with Salmon-mapped reads or are these two independent steps?
Take a look at the Salmon paper.
I think alignment free was a bit better on simulated data without bias.
I've followed your advice and used tximport for read depth normalization of the Salmon TPM values. In the next step, I want to normalize between samples using TMM from edgeR. I am not necessarily interested in performing classical DTU/DGE analysis but rather want to visualize and compare the expression changes over time. Would you recommend using scaledTPM transcript counts or lengthscaledTPM transcript counts as an input for the edgeR TMM normalization?
Many thanks again!
I'm more familiar with the DESeq2 normalization functions, so I'll show that. If you had TPMs and wanted to normalize them so they are more comparable across samples, you can use the median ratio method:
Thanks a lot! Would tpm.mat then be the matrix of the originally estimated transcript counts (unscaled TPMs = countsFromAbundance="no" option from tximport) as output by Salmon?
Abundances are never modified by tximport, only counts. So it’s the abundance matrix regardless of options