Documentation

Access all resources and documentation or contact us for additional assistance.

BTC Quickstart

Brainome’s BTC is the fastest, easiest way to build classification machine learning models from your CSV data.
This quickstart will show you how to build and train models, validate them and use them for predictive modeling.

If you have not already installed BTC on your computer, follow the instructions here.

Data

For this quickstart, we’re going to use the passenger roster from the Titanic disaster to predict what passengers will survive the disaster.

First, download the following four files from Brainome’s repository:

curl https://download.brainome.ai/data/public/titanic_train.csv -o titanic_train.csv
curl https://download.brainome.ai/data/public/titanic_validate.csv -o titanic_validate.csv
curl https://download.brainome.ai/data/public/titanic_predict.csv -o titanic_predict.csv
curl https://download.brainome.ai/data/public/titanic_prod.py -o titanic_prod.py

The titanic_train.csv is our training data file. It contains a roster of the passengers with information about them. The last column is our target, “Survived”, and it indicates if the person survived or perished:

more titanic_train.csv

PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation,Survived
1,3,”Braund, Mr. Owen Harris”,male,22,1,0,A/5 21171,7.25,,S,died
2,1,”Cumings, Mrs. John Bradley (Florence Briggs Thayer)”,female,38,1,0,PC 17599,71.2833,C85,C,survived
3,3,”Heikkinen, Miss. Laina”,female,26,0,0,STON/O2. 3101282,7.925,,S,survived

Train

Using the titanic_train.csv file, we’re going to build our first model:

btc titanic_train.csv

Brainome Table Compiler v1.000-399-beta
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                     BTC Free  (Evaluation)
Expiration Date:                 2021-12-12   235 days left
Maximum File Size:               100 MB
Maximum Instances:               20000
Maximum Attributes:              100
Maximum Classes:                 unlimited
Connected to:                    daimensions.brainome.ai  (local execution)

Command:
      btc titanic_train.csv

Start Time:                     04/21/2021, 09:53 PDT


. . . lines deleted . . .

Predictor:                            a.py
        Classifier Type:                  Random Forest
        System Type:                      Binary classifier
        Training / Validation Split:  60% : 40%
        Accuracy:
          Best-guess accuracy:            61.50%
          Training accuracy:             100.00% (479/479 correct)
          Validation Accuracy:            77.88% (250/321 correct)

. . . lines deleted . . .

Congratulations! You’ve created your first predictor using BTC !

BTC outputs a lot of information, including measurement results and model accuracy. The output above has been trimmed down for clarity. Most importantly, you have created a python executable model (a predictor).
From the information above, we can see that BTC built a random forest model with an accuracy on held-out data of 77.88%. It split the original dataset into a training set (60% – 479 instances) and a validation set (40% – 321 instances).

Validate

BTC created a a.py executable model. We are going to the data file titanic_validate.csv to do a separate validation of the predictor:

python3 a.py  titanic_validate.csv  -validate
Classifier Type:                        Random Forest
System Type:                            Binary classifier

Accuracy:
        Best-guess accuracy:                61.25%

        Model accuracy:                     80.00% (64/80 correct)
        Improvement over best guess:        18.75% (of possible 38.75%)

Model capacity (MEC):                   11 bits
Generalization ratio:                   5.60 bits/bit

Confusion Matrix:

      Actual   |   Predicted    
      —————————-
      died     |       42      7
      Survived |        9     22

Accuracy by Class:

      Target   | TP FP TN FN TPR     TNR     PPV     NPV     F1      TS
      ——– | — – —  — ——- ——- ——- ——- ——- ——-
      died     | 42 9 22  7  85.71%  70.97%  82.35%  75.86%  84.00%  72.41%
      survived | 22 7 42  9  70.97%  85.71%  75.86%  82.35%  73.33%  57.89%

Predict

BTC created a a.py executable model. We are going to the data file titanic_validate.csv to do a separate validation of the predictor:

more titanic_predict.csv
PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation
881,2,”Shelley, Mrs. William (Imanita Parrish Hall)”,female,25,0,1,230433,26,,S
882,3,”Markun, Mr. Johann”,male,33,0,0,349257,7.8958,,S
883,3,”Dahlberg, Miss. Gerda Ulrika”,female,22,0,0,7552,10.5167,,S

To run predict outcome using our predictor:

python3 a.py titanic_predict.csv > prediction.csv
more prediction.csv
PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation,Prediction
881,2,”Shelley, Mrs. William (Imanita Parrish Hall)”,female,25,0,1,230433,26,,S,survived
882,3,”Markun, Mr. Johann”,male,33,0,0,349257,7.8958,,S,died
883,3,”Dahlberg, Miss. Gerda Ulrika”,female,22,0,0,7552,10.5167,,S,died
884,2,”Banfield, Mr. Frederick James”,male,28,0,0,C.A./SOTON 34068,10.5,,S,died
885,3,”Sutehall, Mr. Henry Jr”,male,25,0,0,SOTON/OQ 392076,7.05,,S,died
886,3,”Rice, Mrs. William (Margaret Norton)”,female,39,0,5,382652,29.125,,Q,died
887,2,”Montvila, Rev. Juozas”,male,27,0,0,211536,13,,S,died
888,1,”Graham, Miss. Margaret Edith”,female,19,0,0,112053,30,B42,S,survived
889,3,”Johnston, Miss. Catherine Helen Carrie”””,female,,1,2,W./C. 6607,23.45,,S,survived
890,1,”Behr, Mr. Karl Howell”,male,26,0,0,111369,30,C148,C,died
891,3,”Dooley, Mr. Patrick”,male,32,0,0,370376,7.75,,Q,died

The predictor created a prediction.csv data file. It added a column “Prediction” to the original data file and including the prediction for every passenger.

Production

Including and using your predictor in a python script is very easy. The file titanic_prod.py shows you how to:

  • Import the predictor as a library
  • Use the function “predict_instance” to make predictions in your code.

more titanic_prod.py
#! /usr/bin/env python3

from a import predict_instance

# This line is required
if __name__ == ‘__main__’:

        passenger1 = [881,2,”Shelley”,”Mrs. William (Imanita Parrish Hall)”, “female”,25,0,1,230433,26,””,”S”]
        passenger2 = [882,3,”Markun”, “Mr. Johann”,”male”,33,0,0,349257,7.8958,””,”S”]

        print(“Passenger1: “, passenger1)
        print(“Passenger2: “, passenger2)
        print(“”)

        print(“Predicting a class: “)
        print(”  Passenger1: “, predict_instance(passenger1, prob=False))
        print(”  Passenger2: “, predict_instance(passenger2, prob=False))
        print(“”)

        print(“Predicting a probability: “)
        print(”  Passenger1: “, predict_instance(passenger1, prob=True))
        print(”  Passenger2: “, predict_instance(passenger2, prob=True))

When you run this python code, you can get either the classification output or the probability associated with each outcome.

python3 titanic_prod.py
Passenger1:  [881, 2, ‘Shelley’, ‘Mrs. William (Imanita Parrish Hall)’, ‘female’, 25, 0, 1, 230433, 26, ”, ‘S’]
Passenger2:  [882, 3, ‘Markun’, ‘Mr. Johann’, ‘male’, 33, 0, 0, 349257, 7.8958, ”, ‘S’]

Predicting a class:
  Passenger1:  survived
  Passenger2:  died

Predicting a probability:
  Passenger1:  {‘died’: 0.07296867035311438, ‘survived’: 0.9270313296468855}
  Passenger2:  {‘died’: 0.9432266063450672, ‘survived’: 0.05677339365493274}

Improving the predictor

Because we have both a training data set (titanic_train.csv) and a validation data set (titanic_validate.csv) we can probably improve the prediction accuracy by training on 100% of the training data. To do so, we use the -nosplit flag.

btc -nosplit titanic_train.csv
WARNING: Could not detect a GPU. Neural Network generation will be slow.

Brainome Table Compiler v1.000-399-beta
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.

… lines deleted …

Command:
        btc -nosplit titanic_train.csv

… lines deleted …

Predictor:                            a.py
        Classifier Type:                  Random Forest
        System Type:                      Binary classifier
        Training / Validation Split:        Unable to split dataset. The predictor was trained and evaluated on the same data.
        Accuracy:
          Best-guess accuracy:            61.50%

          Combined Model Accuracy:   100.00% (800/800 correct)

        Model Capacity (MEC):             13        bits

… lines deleted …

 The new model has 100% accuracy … but on the training data! The real test is on the validation data:

python3 a.py -validate titanic_validate.csv
Classifier Type:                        Random Forest
System Type:                            Binary classifier

Accuracy:
        Best-guess accuracy:                61.25%
        Model accuracy:                     81.25% (65/80 correct)
        Improvement over best guess:        20.00% (of possible 38.75%)

Model capacity (MEC):                   13 bits
Generalization ratio:                   4.82 bits/bit

Confusion Matrix:

      Actual |   Predicted    
      —————————-
          died |     42       7
      survived |      8      23

Accuracy by Class:
        target | TP FP TN FN TPR     TNR     PPV     NPV     F1      TS
      ——– | — — — — ——- ——- ——- ——- ——- ——-
          died | 42  8 23  7 85.71%  74.19%  84.00%  76.67%  84.85%  73.68%
      survived | 23  7 42  8 74.19%  85.71%  76.67%  84.00%  75.41%  80.53%

As you can see, we have improved the validation to 81.25% (from 80.00% before).

You will find more information in the Tutorial section of our documentation on the various models you can build, optimization and measurements.