Gene clusters based on gene expression across samples
3
0
Entering edit mode
@jane-merlevede-5019
Last seen 6.1 years ago

Dear all,

Using an RNASeq experiment of 16 samples, I would like to cluster genes, and not samples. My aim is to identify subsets of genes that behave similarly across the samples, which could be possible using a good metric (maybe correlation?).

After a normalization performed with DESeq2, I computed TPM - normalized counts could be used as well I guess.
Applying kmeans on ~25000 genes gives:
Kmeans5=kmeans(TPMData,5,iter.max = 1000, nstart = 10000)
Kmeans5$size
1     4 24859     1    90
or
Kmeans10=kmeans(TPMData,10,iter.max = 1000, nstart = 1000)
Kmeans10$size
1     1 22848    13     1    75  1742   270     3     1

The problem of cluster size may be due to the sensitivity to outliers.
Would you have suggestions, like other methods, that could help for this purpose?
Thank you in advance

kmeans gene expression clustering • 1.5k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

Have you looked at WGCNA?

ADD COMMENT
0
Entering edit mode

No, I did not know.

I will take a look, thank you

ADD REPLY
0
Entering edit mode
@jane-merlevede-5019
Last seen 6.1 years ago

Well, I tried WGCNA and indeed, it seems to fit my needs. Thank you.

I would like to test some simpler clustering methods, like methods based on medoid partitioning.

Using kmeans(), on 2 distinct datasets, with 5 or 10 clusters, I got several clusters (between 2 or 4) containing a single gene. Did you meet this problem?

kmeans() does neither allow to require a minimum number of genes per cluster, nor to change the Euclidean distance between objects, for correlation for example.

Then, I tried skmeans() that uses cosine dissimilarity between objects. On both datasets, I did not get cluster of a single gene, but rather this type of repartition:

Class sizes: 842, 1152, 1072, 1206, 4990, 12207, 1064, 555, 1102, 765

Here again, I cannot use correlation between genes to find genes that vary similarly across the samples.

Using kcca() from flexclust package, it should be possible to use the correlation, but I was not successful for now.

Do some of you use "simple" clustering methods on genes with success to describe gene expression similarity?

ADD COMMENT
0
Entering edit mode
@jane-merlevede-5019
Last seen 6.1 years ago

Any feedback on gene clustering using classical clustering methods?

ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6