Question

Machine learning methods for feature selection and possible correlation with clinical data

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 2 days ago

Germany/Heidelberg/German Cancer Resear…

Dear ALL,

in addition to one of my previous posts (https://support.bioconductor.org/p/65620/#65640), about possible data integration of transcriptomic and clinical data, i would like to make a more general question in order to get some useful suggestions or ideas about machine learning concepts. More specifically, i have preprosessed and analyzed two affymetrix colon cancer datasets with matched samples each(17 and 13 patients, hgu133plus2 and hgu113a repspectively). Also, i have performed functional analysis in each dataset, and i also have underline the common overlapping genes from both lists of DE genes found from both datasets. Thus, my second goal is to use some machine learning method, in order to extract possible features for classification of cancer and control(adjucent control) samples. As im fresh in R without any exprerience in machine learning or these concepts, thus far i have searched the literature and some tutorials, thus my main questions are the below:

1. Firstly, from personal experience or similar work, is there a package or a specific concept that have been used and validated more extensively in cancer studies ? I have read many methods, such as random forests or KNN---also a package cancerclass for specific cancer studies---Or one the other hande, should i search and use more than one method to construct a predictor and then evaluate the error estimates and choose the "best" method in terms of sensitivity and specificity?

2. Secondly, as i have two different microarray datasets, should i perform the chosen methodology in each separately, or should i merge them in a way and then construct a prediction model in the fusion set ?

3. Finally, as i have also for each dataset a csv file with clinical data about each patient(PET data, such as fractal dimension, standarized uptake value-SUV), is there a possible methodology to integrate them and find possible correlation with the gene expression data(i.e possible biomarkers) ? I have found a package called MineICA(http://www.bioconductor.org/packages/release/bioc/html/MineICA.html), but i have never used it.

Thank for your patience, and please excuse me for my long message or naive questions, but because i dont have prior experience in these methodologies, i would like some directions on how should i move and not generally test packages.

machinelearning clinicaldata biomarkers data integration bioconductor • 1.8k views

ADD COMMENT • link 10.0 years ago svlachavas ▴ 840