Brainome Data Compiler
The Brainome Data Compiler is the world’s first data compiler for solving supervised machine learning problems.
In computer science, a code compiler (such as gcc) takes a program (or, more generally, a function) written in one language (C, C++, etc) and re-implements the same function in another language (e.g., assembly).

A function implemented in C, C++ or some other programming language

GCC or some other compiler
A function implemented in assembly language
In machine learning, instead of writing computer programs, we curate data sets that we believe contain a function (dog vs cat, good credit risk vs bad credit risk, etc). The goal is to use machine learning techniques to identify the function that explains the data set and encapsulate this function within a model or predictor. Brainome does exactly this in a 3 step process:

Data Set
1
BTC takes a labeled data set as input.

Brainome Table Compiler
2
BTC measures the data set to identify and size the explanatory function.
Python Predictor
3
BTC builds a predictor function based on the learnability measurements.
1
Brainome takes a labeled data set as input.
2
Brainome measures the data set to identify and size the explanatory function.
3
Brainome builds a predictor function based on the learnability measurements.
Features and Requirements:
Input Data Set |
CSV Format |
One column must contain the class labels (target column) |
Cell values supported: strings, floats, integers |
No pre-processing necessary |
No limit on number of rows or columns |
Unlimited number of classes. 100 instances per class recommended for best result. |
Support for unbalanced data sets |
Support for sparse data sets |
Support for missing value data sets |
Model Creation |
Support for Decision Trees, Neural Networks, Random Forest |
Measurement driven building process for optimum model size and speed |
Produces very small models, often kilobytes in size; 2 to 3 orders of magnitude smaller than models produced by other tools |
Stand-alone python executable that requires only Numpy |
Written in clear text Python code (easily committed to your Git repository) |
GPU not required to run model |
Measurements |
Data sufficiency (Capacity Progression) |
Attribute ranking |
Number of model parameters needed to learn |
Overfit risk |
Expected generalization |
Integration |
BTC can be scripted from the command line |
BTC measurements are encapsulated in JSON or plain text |
Features and Requirements:
Input Data Set |
CSV Format |
One column must contain the class labels (target column) |
Cell values supported: strings, floats, integers |
No pre-processing necessary |
No limit on number of rows or columns |
Unlimited number of classes. 100 instances per class recommended for best result. |
Support for unbalanced data sets |
Support for sparse data sets |
Support for missing value data sets |
Model Creation |
Support for Decision Trees, Neural Networks, Random Forest |
Measurement driven building process for optimum model size and speed |
Produces very small models, often kilobytes in size; 2 to 3 orders of magnitude smaller than models produced by other tools |
Stand-alone python executable that requires only Numpy |
Written in clear text Python code (easily committed to your Git repository) |
GPU not required to run model |
Measurements |
Data sufficiency (Capacity Progression) |
Attribute ranking |
Number of model parameters needed to learn |
Overfit risk |
Expected generalization |
Integration |
Brainome can be scripted from the command line |
Brainome measurements are encapsulated in JSON or plain text |