My data consists of 130 patients, time (4 levels for time of measurement), response Y which is a bivariate variable (health status) and 190 metabolite variables. So the original data as per column: SUBJECT, TIME, Y, METABOLITES. Dimension : 130x193. Research questions: how metabolites affect Y.
In trying to implement the selection using the example of "sacurine" data I dindin't figure out how to get my variableMetadata.
I defined the dataMatrix as being the metabolites only so the dimension is 130x190
sampleMetadata: SUBJECT and TIME, so dimension is 130x2
However, I could not figure out the variableMetadata.
Would you please tell me if the feature selection you explained in your post applies to my data's structure.
Hi,
In your case, the biosigner approach can help you select the metabolites which significantly contribute to the classification performance of the health status (with either the PLS-DA, Random Forrest or SVM algorithms). However, the repeated structure of your data (longitudinal analysis) will not be handled by our current implementation: you may therefore select only one time point or use all times points (which in this case will be treated as independent observations).
The format of your dataMatrix is correct. Your sampleMetadata should include the health status (which will be the only information used by the algorithm). The variableMetadata contains additional information about your metabolites. If you do not have any, just provide a data frame with the variable names as row names (identical to the column names of the dataMatrix) and create a single column (e.g. by repeating the names).
Best,
Etienne.
Please use my account "Etienne Thevenot" ("Etienne" is deprecated).