Documentation

Access all resources and documentation or contact us for additional assistance.

Brainome Quickstart

Brainome is the fastest, easiest way to build classification machine learning models from your CSV data. 

This quickstart will show you how to build models, validate them and use them for predictive modeling.

If you have not already installed Brainome on your computer, follow the instructions  here.

Data

For this quickstart, we’re going to use the passenger roster from the Titanic disaster to predict what passengers will survive the disaster.

First, download the following four files from Brainome’s repository:

curl https://download.brainome.ai/data/public/titanic_train.csv -o titanic_train.csv
curl https://download.brainome.ai/data/public/titanic_validate.csv -o titanic_validate.csv
curl https://download.brainome.ai/data/public/titanic_predict.csv -o titanic_predict.csv
curl https://download.brainome.ai/data/public/titanic_prod.py -o titanic_prod.py

The titanic_train.csv is our training data file. It contains a roster of the passengers with information about them. The last column is our target, “Survived”, and it indicates if the person survived or perished:

more titanic_train.csv

PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation,Survived
1,3,”Braund, Mr. Owen Harris”,male,22,1,0,A/5 21171,7.25,,S,died
2,1,”Cumings, Mrs. John Bradley (Florence Briggs Thayer)”,female,38,1,0,PC 17599,71.2833,C85,C,survived
3,3,”Heikkinen, Miss. Laina”,female,26,0,0,STON/O2. 3101282,7.925,,S,survived

Train

Using the titanic_train.csv file, we’re going to build our first model:

brainome titanic_train.csv

Brainome Table Compiler v1.005-7-prod
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.
Licensed to:                     Brainome Free  (Evaluation)
Expiration Date:                 2021-12-12   235 days left
Maximum File Size:               100 MB
Maximum Instances:               20000
Maximum Attributes:              100
Maximum Classes:                 unlimited
Connected to:                    daimensions.brainome.ai  (local execution)

Command:
      brainome titanic_train.csv

Start Time:                     04/21/2021, 09:53 PDT


. . . lines deleted . . .

Predictor:                    a.py
     Classifier Type:              Random Forest
     System Type:                  Binary classifier
     Training / Validation Split:  60% : 40%
     Accuracy:
        Best-guess accuracy:       61.50%
        Training accuracy:         86.84% (416/479 correct)
        Validation Accuracy:       80.99% (260/321 correct)
        Combined Model Accuracy:   84.50% (676/800 correct)

. . . lines deleted . . .

Congratulations! You’ve created your first predictor using Brainome ! 

Brainome outputs a lot of information, including measurement results and model accuracy. The output above has been trimmed down for clarity. Most importantly, you have created a python executable model (a predictor). 

From the information above, we can see that Brainome built a random forest model with an accuracy on held-out data of 77.88%. It split the original dataset into a training set (60% – 479 instances) and a validation set (40% – 321 instances).

Validate

Brainome created a a.py executable model. We are going to the data file titanic_validate.csv to do a separate validation of the predictor:

python3 a.py  titanic_validate.csv  -validate
Classifier Type:                        Random Forest
System Type:                            2-way classifier

Accuracy:
        Best-guess accuracy:            61.25%
        Model accuracy:                 81.25% (65/80 correct)
        Improvement over best guess:    20.00% (of possible 38.75%)

Model capacity (MEC):                   41 bits
Generalization ratio:                   1.52 bits/bit

Confusion Matrix:

      Actual   |   Predicted    
     – – – – – – – – – – – – – – –
      died     |       44      5
      Survived |       10     21

Accuracy by Class:

      Target   | TP FP TN FN TPR     TNR     PPV     NPV     F1      TS
     – – – – – | –  –  –  –  – – –   – – –   – – –   – – –   – – –   – – – 
      died     | 44 10 21  5  89.80%  67.74%  81.48%  80.77%  85.44%  74.58%
      survived | 21  5 44 10  67.74%  89.80%  80.77%  81.48%  73.68%  58.33%

The predictor took the data from the validation, predicted an outcome for each row and compared it to the actual outcome. 

We can see that we got 80% accuracy on the held out data.

Predict

The predictor can also be used to predict the outcome of a data file. The data file used to predict is identical to the training data file except that it does not have the target column “Survived”.

more titanic_predict.csv
PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation
881,2,”Shelley, Mrs. William (Imanita Parrish Hall)”,female,25,0,1,230433,26,,S
882,3,”Markun, Mr. Johann”,male,33,0,0,349257,7.8958,,S
883,3,”Dahlberg, Miss. Gerda Ulrika”,female,22,0,0,7552,10.5167,,S

To run predict outcome using our predictor:

python3 a.py titanic_predict.csv > prediction.csv
more prediction.csv

PassengerId,Cabin_Class,Name,Sex,Age,Sibling_Spouse,Parent_Children,Ticket_Number,Fare,Cabin_Number,Port_of_Embarkation,Prediction
881,2,”Shelley, Mrs. William (Imanita Parrish Hall)”,female,25,0,1,230433,26,,S,survived
882,3,”Markun, Mr. Johann”,male,33,0,0,349257,7.8958,,S,died
883,3,”Dahlberg, Miss. Gerda Ulrika”,female,22,0,0,7552,10.5167,,S,died
884,2,”Banfield, Mr. Frederick James”,male,28,0,0,C.A./SOTON 34068,10.5,,S,died
885,3,”Sutehall, Mr. Henry Jr”,male,25,0,0,SOTON/OQ 392076,7.05,,S,died
886,3,”Rice, Mrs. William (Margaret Norton)”,female,39,0,5,382652,29.125,,Q,died
887,2,”Montvila, Rev. Juozas”,male,27,0,0,211536,13,,S,died
888,1,”Graham, Miss. Margaret Edith”,female,19,0,0,112053,30,B42,S,survived
889,3,”Johnston, Miss. Catherine Helen Carrie”””,female,,1,2,W./C. 6607,23.45,,S,survived
890,1,”Behr, Mr. Karl Howell”,male,26,0,0,111369,30,C148,C,died
891,3,”Dooley, Mr. Patrick”,male,32,0,0,370376,7.75,,Q,died

The predictor created a prediction.csv data file. It added a column “Prediction” to the original data file and including the prediction for every passenger.

Production

Including and using your predictor in a python script is very easy. The file titanic_prod.py shows you how to:

  • Import the predictor as a library
  • Use the function “predict_instance” to make predictions in your code.

more titanic_prod.py
#! /usr/bin/env python3

# This example shows how to import a predictor in a production enviroment.
# The predictor can be used to create a hard prediction or a soft probability
# (RF and NN only).

from a import predict

# Create 2 new passengers
person1 = [‘1’, ‘3’, “Braund, Mr. Owen Harris”, “male”, ’22’, ‘1’, ‘0’, “A/5 21171”, ‘7.25’, “”, “S”]
person2 = [2, 1, “Cumings, Mrs. John Bradley (Florence Briggs Thayer)”, “female”, 38, 1, 0, “PC 17599”, 71.2833, “C85”, “C”]

# This line is required
if __name__ == ‘__main__’:

   print(“Titanic predictor.predict() example – single record”)
   print(“”)
   print(f”Person 1:  {person1}”)
   print(f”Prediction: {predict([person1])[0]}”)
   # Probabilities only work for models type NN and RF.
   print(f”Probabilities: {predict([person1], return_probabilities=True)}”)
   print(“”)
   print(f”Person 2:  {person2}”)
   print(f”Prediction: {predict([person2])[0]}”)
   # Probabilities only work for models type NN and RF.
   print(f”Probabilities: {predict([person2], return_probabilities=True)}”)

When you run this python code, you can get either the classification output or the probability associated with each outcome.

python3 titanic_prod.py
Titanic predictor.predict() example – single record

Person 1:  [‘1’, ‘3’, ‘Braund, Mr. Owen Harris’, ‘male’, ’22’, ‘1’, ‘0’, ‘A/5 21171’, ‘7.25’, ”, ‘S’]
Prediction: died
Probabilities: [[‘died’ ‘survived’]
 [‘0.8833638515749265’ ‘0.11663614842507353’]]

Person 2:  [2, 1, ‘Cumings, Mrs. John Bradley
(Florence Briggs Thayer)’, ‘female’, 38, 1, 0, ‘PC 17599’, 71.2833, ‘C85’, ‘C’]
Prediction: survived
Probabilities: [[‘died’ ‘survived’]
 [‘0.10867684991596138’ ‘0.8913231500840386’]]

Improving the predictor

Because we have both a training data set (titanic_train.csv) and a validation data set (titanic_validate.csv) we can probably improve the prediction accuracy by training on 100% of the training data. To do so, we use the -nosplit flag.

brainome -nosplit titanic_train.csv
WARNING: Could not detect a GPU. Neural Network generation will be slow.

Brainome Table Compiler v1.000-399-beta
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.

… lines deleted …

Command:
        brainome -nosplit titanic_train.csv

… lines deleted …

Predictor:                            a.py
        Classifier Type:                  Random Forest
        System Type:                      Binary classifier
        Training / Validation Split:      Unable to split dataset. The predictor was trained and evaluated on the same data.
        Accuracy:
             Best-guess accuracy:            61.50%
             Combined Model Accuracy:        99.87% (799/800 correct)
             Model Capacity (MEC):           82 bits

… lines deleted … 

 The new model has 99.9% accuracy … but on the training data! The real test is on the validation data:

python3 a.py -validate titanic_validate.csv
Classifier Type:                        Random Forest
System Type:                            2-way classifier

Accuracy:
        Best-guess accuracy:            61.25%
        Model accuracy:                 83.75% (67/80 correct)
        Improvement over best guess:    22.50% (of possible 38.75%)

Model capacity (MEC):                   82 bits
Generalization ratio:                   0.78 bits/bit

Confusion Matrix:

        Actual |   Predicted
    – – – – – – – – – – – – – – –
          died |     43       6
      survived |      7      24

Accuracy by Class:
        target | TP FP TN FN TPR     TNR     PPV     NPV     F1      TS
     – – – – – | –  –  –  –  – – –   – – –   – – –   – – –   – – –   – – – 
          died | 43 7  24 6  87.76%  77.42%  86.00%  80.00%  86.87%  76.79%
      survived | 24 6  43 7  77.42%  87.76%  80.00%  86.00%  78.69%  64.86%

As you can see, we have improved the validation to 83.75% (from 81.25% before). 

You will find more information in the Tutorial section of our documentation on the various models you can build, optimization and measurements.