Hi all,
I was wondering if anybody is familiar with machine learning techniques and RNAseq data.
I have time-course RNA seq data from multiple different samples, i.e, organs. What I would like to do is extract patterns that are forming in the data going from time 0 up to the last time point. However, I don't want this to happen only from time point 0-1 in one sample. Given that I have time point data for multiple organs, I would like to find relationships between the way genes move in time across all organs and try understand what genes change in a similar patterns, which don't, what those patterns are and also... if one gene changes, does another change in an interesting manner because of the change in one or more genes. Is there a common trend across all organs along these lines? Are there any special relationships from patterns that not just form in one organ, but multiple.
So far I have done this all manually using very nice packages in r that do linear regression etc.., however, I feel it would be very easy to miss subtle changes in the data that might be taking place, maybe because something else happens in other genes etc and as such it is very difficult to see these sorts of changes by eye so to speak.
I have been reading about machine learning in general however, I am unsure about exactly which type of machine learning technique to use as well as how to implement this in RNAseq data and instruct the algorithm to explicitly look for what I am interested in. As such, any advice from anybody who has done just this or carried out machine learning in general, not necessarily on time course data could advise on how to start and the way in which the data should be set up and input to the algorithms would be very helpful!!
Many thanks!
You can take a look at this paper that gives a good summary on the different methods used for time-series gene expression data https://www.nature.com/articles/nrg3244
Personally, I've tried using STEM (to cluster time-series), which can be easily downloaded from its homepage. Nonetheless, traditional clustering methods like K-means or Hierarchical clustering still prove useful in some cases. So, I would suggest you could try the simpler clustering techniques and proceed from there. Another way is to perform differential gene expression analysis (a good example on how to do this can be found here http://www.bioconductor.org/help/workflows/rnaseqGene/#time), and then perform the standard clustering algorithms on the differentially expressed genes.