Hi everybody, my question is the following..
I have a sample of galaxy radial velocities in a galaxy cluster (unfortunately the size of this sample is N=18, I know..N<20 is not the best) and I wish to know what is the number of Gaussians which fit my data distribution in the best way [this can assume the values G=1:3]. Afterthat, I want to know what are the best Gaussian parameters. I expect G=3 as a best result of number of Gaussians to consider, but I need a number (I guess the log likelihood) that describes the significance of this case. Many use MCLUST (R package) for modeling data as a Gaussian finite mixture. I read that It allows to find the optimal number of components (through a clustering hierarchical approach) and the corresponding classification.
I tried to use the following pipelines:
1)Mclust with only these parameters...
> modClust = Mclust(dataset,G=1:3,modelsName="V")
fitting ...
|==============================================================| 100%
> summary(modClust)
%----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
%----------------------------------------------------
Mclust X (univariate normal) model with 1 component:
log.likelihood n df BIC ICL
-157.1966 18 2 -320.174 -320.174
Clustering table:
1
18
but it returns the best G value =1.. but I know it should be 3
2) I thought an alternative method could be to perform a FOR cycle in which I change the G value and I compare the log likelihood values..
Have you advices? What am I doing wrong? What am I not considering?
thanks in advance for the help,
Andrea
I'm removing the "DESeq2" tag as I can't see any relevance to DESeq2.