How can machine learning be applied to Genomics?
Can you use machine learning techniques to predict cancer in patients, the success rate of a new drug or even understand why Covid affects some people vs others?
Using machine learning to solve problems in Genomics is a lot more challenging than solving other problems.
This is mainly due to the fact that acquiring data through clinical trials is expensive and time consuming. In addition, you end up with a very hard dataset to work with.
- Limited rows – a few hudred patients
- Very large number of columns columns – roughly 21K, each representing a gene expressions.
Extracting a working model from such a challenging dataset using typical machine learning algorithms would be near impossible without overfitting. Many algorithms require a minimum of 100 data points per class, making small data sets irrelevant. This forces you to spend more money on collecting data and compute.
This is where the next step in machine learning makes a huge impact: measurements.
Brainome’s model aware pre-training measurements are able to pinpoint the handful of relevant attributes that accurately differentiate classification predictions. These few genes are the only ones needed in your model to maximize the accuracy on unseen data. Suddenly, we transform a complex problem with thousands of columns to a more manageable data set making building a general predictor entirely feasible.
Brainome has recently been working on a joint Ovarian Cancer study with Cedar Sinai. The gene expression data gathered by Cedars Sinai contained 584 sample cells each with 21000 gene expression features labeled whether the cells were healthy or cancerous. Brainome processed this data through their measurement engine ‘DaimensionsTM’.
In a couple minutes, DaimensionsTM was able to extract the single gene “VWA7”.
DaimensionsTM built a model with that single parameter to predict ovarian cancer with 100% accuracy. Brainome’s findings were confirmed by the team at Cedar Sinai that VWA7 is instrumental in predicting ovarian cancer.
In this study and several more like this one, we are able to answer in confidence that using measurements in machine learning can and should be used in Genomics. Not only can we answer complex questions, we can do it faster, saving time and money on collecting possibly unnecessary data and excessive computing resources.