Brainome Data Compiler

The Brainome Data Compiler is the world’s first data compiler for solving supervised machine learning problems.

In computer science, a code compiler (such as gcc) takes a program (or, more generally, a function) written in one language (C, C++, etc) and re-implements the same function in another language (e.g., assembly).

A function implemented in C, C++ or some other programming language

GNU_Compiler_Collection_logo.svg

GCC or some other compiler

A function implemented in assembly language

In machine learning, instead of writing computer programs, we curate data sets that we believe contain a function (dog vs cat, good credit risk vs bad credit risk, etc). The goal is to use machine learning techniques to identify the function that explains the data set and encapsulate this function within a model or predictor. Brainome does exactly this in a 3 step process:

Data Set

1

BTC takes a labeled data set as input.

Brainome Table Compiler

2

BTC measures the data set to identify and size the explanatory function.

Python Predictor

3

BTC builds a predictor function based on the learnability measurements.

1

Brainome takes a labeled data set as input.

2

Brainome measures the data set to identify and size the explanatory function.

3

Brainome builds a predictor function based on the learnability measurements.

Features and Requirements:

Input Data Set
CSV Format
One column must contain the class labels (target column)
Cell values supported: strings, floats, integers
No pre-processing necessary
No limit on number of rows or columns
Unlimited number of classes. 100 instances per class recommended for best result.
Support for unbalanced data sets
Support for sparse data sets
Support for missing value data sets
Model Creation
Support for Decision Trees, Neural Networks, Random Forest
Measurement driven building process for optimum model size and speed
Produces very small models, often kilobytes in size; 2 to 3 orders of magnitude smaller than models produced by other tools
Stand-alone python executable that requires only Numpy
Written in clear text Python code (easily committed to your Git repository)
GPU not required to run model
Measurements
Data sufficiency (Capacity Progression)
Attribute ranking
Number of model parameters needed to learn
Overfit risk
Expected generalization
Integration
BTC can be scripted from the command line
BTC measurements are encapsulated in JSON or plain text

Features and Requirements:

Input Data Set
CSV Format
One column must contain the class labels (target column)
Cell values supported: strings, floats, integers
No pre-processing necessary
No limit on number of rows or columns
Unlimited number of classes. 100 instances per class recommended for best result.
Support for unbalanced data sets
Support for sparse data sets
Support for missing value data sets
Model Creation
Support for Decision Trees, Neural Networks, Random Forest
Measurement driven building process for optimum model size and speed
Produces very small models, often kilobytes in size; 2 to 3 orders of magnitude smaller than models produced by other tools
Stand-alone python executable that requires only Numpy
Written in clear text Python code (easily committed to your Git repository)
GPU not required to run model
Measurements
Data sufficiency (Capacity Progression)
Attribute ranking
Number of model parameters needed to learn
Overfit risk
Expected generalization
Integration
Brainome can be scripted from the command line
Brainome measurements are encapsulated in JSON or plain text