May I ask you a question concerning an error message, that I don’t understand.
We have a similar datatset (no missing values) as you described in your manual (data(sacurine)).
When I try the PCA of my matrix (data3.pca<- opls(x))
then everything looks very nice.
But, when trying a PLS-DA, then I only receive an error message.
data3.pls<- opls(x, genderFc)
data3.pls<- opls(x, genderFc)
Error: No model was built because the first predictive component was already not significant; Select a number of predictive components of 1 if you want the algorithm to compute a model despite this.
What could be the reason for this error? Is there any easy way to proceed?
By default, ropls automatically selects the optimal number of predictive (PLS) or orthogonal (OPLS) components. To do this, the algorithm checks if the addition of an additional component improves the predictions. Here the message indicates that even the first predictive component was not meaningful, suggesting that the algorithm fails to build a significant PLS model on your dataset. To check this, you can force the algorithm to compute the first components:
data3.pls<- opls(x, genderFc, predI = 2)
You should then observe on the diagnostic plot that the Q2Y value is not significant (i.e. when randomly permuting the response values, the performance of the models are equal or greater than with the true model, meaning that there is overfitting).
The reason is that the meaningful information in your dataset, if any, is too scarce to allow the building of a model (it can be because the number of sampes is too low compared with the number of variables). Did you check the number of significant features by univariate testing followed by multiple testing correction? You can try to add a feature selection step on your training dataset before building the model (we have developed the biosigner package on bioconductor to perform feature selection).
Best wishes,
Etienne.
Note: The algorithms in ropls can cope with a (moderate) amount of missing values.