Hello,
I am no bioinformatician but did a lot of reading and experimenting on RNAseq over the last 2 years and I think I developed quite some understanding about the necessary steps and possible pitfalls etc. Recently, a couple of times I found interesting datasets on gene expression omnibus (GEO) and after downloading I realized that supplied metric for gene expression was TPM. This seems to be the case with a lot of datasets on GEO. As to my understanding TPM is not a good metric when it comes to differential expression analysis. Also DESeq2 won't accept TPM as input as values are not integer. The only truly clean way I can think of for performing the analysis would be downloading raw files from sra and doing the whole QC, alignment and counting from scratch.
So my question is what would be an elegant/simple and clean way of analysing such GEO datasets?
During my search I also came across the suggestion of using log10(TPM + 1). Maybe one might use this approach in order to get a first glimps at the data and depeneding on that decide whether it's worth while doing the analysis from scratch.
I don't have access to a lot of computing power as I'm not a bioinformatician, so mainly simple office hardware. This is why I am trying to avoid doing the whole alignment as it takes me about 1-2h per 10 Million reads.
Thanks for your answer!