Dear all,
Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).
After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
Kmeans5$size
1 4 24859 1 90
or
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
Kmeans10$size
1 1 22848 13 1 75 1742 270 3 1
The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance
No, I did not know.
I will take a look, thank you