Hi everyone! I am studying at the transcriptional level how our cell line behaves when subjected to a certain treatment. I am mainly interested in short treatment times and so far I have managed to find public datasets of experiments done at 6h, 18h, 22h, 1d, 1d, 2d, 2d, 2d, 3d, 4d, 5d, and two weeks all with very similar but not equal growth and treatment protocols and average number of replicates of 3.
I have analyzed all the datasets downloaded from SRA with the same pipeline (nf-core rnaseq) and for now I have analyzed each dataset individually (so case and internal control to the study) with the differentialabundance pipeline to obtain both the lists of differentially expressed genes and the enriched and depleted pathways (GSEA and gProfiler).
What I am wondering is how can I draw robust conclusions now from this mass of data following the trend of the treatment over time? I have several questions:
- Is my approach correct or should I take a different approach by pooling as many samples as possible in the various case/control groups even if they come from different studies? (for example, all those on day 2)
- If my approach is considered valid, what criteria should I use to select the relevant pathways if in some cases they are not all in agreement at the various time points? And what are the criteria for individual genes? (Mean fold change?)
- Are there any software/tools useful for these purposes that take DESeq2 and GSEA tables as input?
This is the first time I have approached a study similar to a meta-analysis and I thank you in advance for your patience.