Documentation

Access all resources and documentation or contact us for additional assistance.

Advanced Options

Brainome has many control options that can be used to get the exact training and model behavior you want when you’re building your predictor. Here are some of the training control options that many users find most helpful:

Control Options

Description

Example

-f

Force the creation of a specific type of machine learning model (currently DT, NN or RF). (The default behavior without this option allows Brainome to decide which kind of model to build automatically.)

-f DT forces the creation of a decision tree

-f NN forces the creation of a neural network

-f RF forces the creation of a random forest

-e

Tell Brainome to make more effort than usual while training a model. (Default is e=1, max is 100).

–e 5 tells Brainome to put 5 times more computation into its training efforts

-nosplit

-nosplit tells Brainome to not split the data into training and validation set.

-nosplit is useful when accuracy is to be maximized at any cost, including the cost of overfitting.

Brainome has other operational options that are useful when you’re building a predictor, including:

Control Options

Description

Example

-v

Run Brainome in verbose mode.

-v Brainome sends a lot of status information to STDIO as it builds your predictor

-o

Write the predictor into this file.

-o mypredictor.py tells Brainome to put your predictor into a file called “mypredictor.py”

-target

Use a specific column as the output / target.

-target Decision tells Brainome to use the column whose header is “Decision” as the target / outcome column

-ignorecolumns

Tells Brainome to not use specific column(s) as part of the training data. (This is typically used for columns containing unique identifiers for rows / data points.)

-ignorecolumns FileName,FileNo tells Brainome to ignore the columns labelled “FileName” and “FileNo”

-rank [n]

Tells Brainome to select only the most useful attributes (columns) in your training data when building a model. (This is typically used with data sets that have many attributes / columns.) You can specify a numeric argument to dictate how many columns to use.

-rank tells Brainome to only use the most helpful attributes (columns) without dictating a specific number of attributes to select
-rank 5 forces Brainome to pick and use the 5 most helpful attributes (columns)

-Wall

Tells Brainome to display all warnings

-Wall sends any warnings generated by Brainome to the standard output (typically your screen)

For example, we’re going to ask Brainome to build a neural network with effort 3:

brainome titanic_train.csv -f NN -o experiment_e3NN.py -e 3> experiment_e3NN.buildnotes

Now let’s run our “three times the effort” “experiment” predictor (named “experiment_e3NN.py”) on new test data that Brainome has not seen before:

python3 experiment_e3NN.py Experiment_data/experiment.test.csv -validate > experiment_e3.results

Once your predictor has been run on new data, you can look at the output to get detailed measurements and explanations of its performance:

more experiment_e3.results
…
Classifier Type: Neural Network
System Type: Binary classifier
Accuracy:
    Best-guess accuracy: 61.50%
    Model accuracy: 60.75% (486/800 correct)
    Improvement over best guess: -0.75% (of possible 38.5%)
Model capacity (MEC): 27 bits
Model Capacity Utilized: 1 bits
Generalization ratio: 17.31 bits/bit
Confusion Matrix:
      Actual | Predicted
    —————————-
        died | 481 11
    survived | 303 5

Accuracy by Class:
      target | TP FP TN FN TPR TNR PPV NPV F1 TS
    ——– | — — — — ——- ——- ——- ——- ——- ——-
        died | 481 303 5 11 97.76% 1.62% 61.35% 31.25% 75.39% 60.50%
    survived | 5 11 481 303 1.62% 97.76% 31.25% 61.35% 3.09% 1.57%

Now we are at a decision point – is this predictor the one I want to deploy into my target workflow or not? Can I go with this or do I need to go back and make changes?

In our case, neither the results for Model Accuracy nor the Generalization Ratio are as strong as they should be. In fact, they are catastrophically bad. So let’s run Brainome again with different parameters.

brainome titanic_train.csv -rank -o experiment_rank.py -e 3> experiment_rank.buildnotes

After running the above command, we see a huge improvement in model Accuracy (from 60.75% to 80.75%), but more importantly, we also see a huge increase in the Generalization Ratio (from 17.31 to 155.28 bits/bit). These results tell us we’re on the right track:

more experiment_rank.results
…
Classifier Type: Random Forest
System Type: Binary classifier

Accuracy:
    Best-guess accuracy: 61.50%
    Model accuracy: 80.75% (646/800 correct)
    Improvement over best guess: 19.25% (of possible 38.5%)

Model capacity (MEC): 4 bits
Generalization ratio: 155.28 bits/bit

Confusion Matrix:

      Actual | Predicted
    —————————-
        died | 438 54
    survived | 100 208

Accuracy by Class:

      target | TP FP TN FN TPR TNR PPV NPV F1 TS
    ——– | — — — — ——- ——- ——- ——- ——- ——-
        died | 438 100 208 54 89.02% 67.53% 81.41% 79.39% 85.05% 73.99%
    survived | 208 54 438 100 67.53% 89.02% 79.39% 81.41% 72.98% 57.46%

False Negative Rate/Miss Rate: 0.24
Critical Success Index: 0.57

In general, you want to choose the predictor with the highest Accuracy and the highest Generalization. If you don’t see a warning from Brainome that the predictor created overfits, then it doesn’t. Unless you use the -riskoverfit option, Brainome will always create a held-out validation set. You can also use the bias meter to see if your predictor has any inherent bias.

If you’re not satisfied with the Accuracy or Generalization of your predictor, don’t worry! Here are some situations that you may encounter, together with suggested actions to take:

Situation

Options/actions

My predictor is near best guess accuracy or Accuracy is just too low.

Check the measurements and the warnings for hints (see the section Decide What to Do Next above)
Try -rank to focus only on features that matter the most
Use -f parameter to choose a different machine learning model
Use -e parameter to put more effort into training

Try using -riskoverfit. If the accuracy is still low, then the dataset is too noisy and needs either preprocessing or attribute selection (Decide What to Do Next)

I want higher Accuracy at all cost.

Try -riskoverfit

My other predictor is 2% better…

Remember that Brainome is an enterprise prediction system. Reliability in terms of reproducibility and resilience are goals that are just as important as accuracy. Data is always noisy and it’s easy to overfit. Check the number of parameters of your other predictor and also run the Brainome Resilience Meter and the bias meter to compare all factors. This is what it takes to make a quality predictor that will perform well in the field.

If you’re satisfied with the Accuracy and Generalization of your predictor, you can move on to deployment. Deploy the predictor just you would with any other piece of source code. The output predictor contains everything needed to run and validate predictions.

Documentation

Advanced Options

Control Options

Description

Example

-f

-e

-nosplit

Control Options

Description

Example

-v

-o

-target

-ignorecolumns

-rank [n]

-Wall

Situation

Options/actions

Product

Resources

Company

Legal

Share it