Brainome Table Compiler

The Brainome Table Compiler (BTC) is the world’s first data compiler for solving supervised machine learning problems.

In computer science, a code compiler (such as gcc) takes a program (or, more generally, a function) written in one language (C, C++, etc) and re-implements the same function in another language (e.g., assembly).

In machine learning, instead of writing computer programs, we curate data sets that we believe contain a function (dog vs cat, good credit risk vs bad credit risk, etc). The goal is to use machine learning techniques to identify the function that explains the data set and encapsulate this function within a model or predictor. BTC does exactly this in a 3 step process:

1

BTC takes a labeled data set as input.

2

BTC analyzes the data set to determine its learnability (what the explanatory function is).

3

BTC builds a model that creates a predictor in the form of a Python function.

Here is a list of current BTC features:

Latest version: 0.990

  • Input data set
    • Format must be .csv
    • 1 column must contain the class labels (target column)
    • Cell values supported: strings, floats, integers
    • No pre-processing necessary
    • Can be any size (no limit on # of rows or columns)
    • Can have any number of classes but we recommend having at least 100 instances of each class to maximize learning
    • Can have any class balance
    • Can be sparse

       

  • Model architectures supported
    • Decision Trees
    • Neural Networks
    • Random Forest (soon)

       

  • Measurements calculated
    • Point of overfit [Memory Equivalent Capacity] for all supported model architectures
    • Risk of overfitting training
    • Data sufficiency [Capacity progression]
       
  • Models generated
    • Often only kilobytes in size; 2 to 3 orders of magnitude smaller than models produced by other methods
    • Stand-alone executable requiring only Numpy sans gpu
    • Written in clear text Python code (easily committed to your git repo)
    • Neural Networks can be exported to ONNX format

       

  • Integration features
    • BTC can be scripted from command line
    • All BTC output (measurements & models) can be encapsulated in JSON or plain text

       

  • Runtime optimizations
    • Neural Network training leverages GPU