Documentation

Access all resources and documentation or contact us for additional assistance.

Bias Tutorial

What is Bias?

Bias is a frequently discussed topic when it comes to societal issues but it’s also very important in science and therefore for machine learning. To narrow down the definition of bias for machine learning, let’s start with a textbook societal example: 5 people are waiting in a room for a job interview. They have the same qualifications. In this particular moment, an unbiased interview would assign equal probability for each candidate to be hired. This is, each of them has a 1/5 chance to be hired. Any other distribution of the probabilities would indicate bias. For example, let’s assume the interviewer is more likely to hire a societal majority or has to obey a policy that alters the chances: any reduction or increase of uncertainty for a job candidate that is unintended (this is, in this example, not based solely on qualification) is defined as bias towards or against the candidate, respectively. Brainome therefore defines bias as undue change of uncertainty. This is, change of uncertainty by factors not intended or known to be part of the experiment.

Brainome’s Bias Measurements

In the literature, class imbalances, an uneven train/validation split or even threshold parameters are often called biased. Brainome’s biasmeter does not measure these obvious imbalances but measures the bias that may be implicit in the model trained with the data. Class imbalances in the data are shown in the pre-training measurements and classification imbalances are obvious in the confusion matrices shown at the end of Brainome’s output. The contribution bias by each attribute is shown in the importance ranking. The following screenshot illustrates these, with the relevant biases marked in red:

Brainome titanic_train.csv -e 5 -split 90

Brainome Table Compiler v1.004-165-prod
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.

Command:
    btc titanic_train.csv -e 5 -split 90

Cleaning…done. 
Splitting into training and validation…done. 
Pre-training measurements…done. 
Pre-training Measurements
Data:
    Input:                      titanic_train.csv
    Target Column:              Survived
    Number of instances:        800
    Number of attributes:        11 out of 11
    Number of classes:            2

Class Balance:                
                            died: 61.50%
                        survived: 38.50%
Learnability:
    Best guess accuracy:          61.50%
    Data Sufficiency:             Maybe enough data to generalize. [yellow]

Capacity Progression:             at [ 5%, 10%, 20%, 40%, 80%, 100% ]
    Ideal Machine Learner:              6,   7,   8,   8,   9,   9

Expected Generalization:
    Decision Tree:                 1.99 bits/bit
    Neural Network:                6.52 bits/bit
    Random Forest:                10.13 bits/bit

Expected Accuracy:              Training            Validation
    Decision Tree:               100.00%                51.62%
    Neural Network:                 —-                  —-
    Random Forest:               100.00%                80.25%

Recommendations:
    Warning: Data has high information density. Using effort 5 and larger ( -e 5 ) can improve results.
    We recommend using Random Forest -f RF.
    If predictor accuracy is insufficient, try using the option -rank to automatically select the important attributes.
    Defaulting to RF model. Model can be forced with -f parameter. 

Building classifier…done.
Training…done. 
done. 
Compiling predictor…done. 
Validating predictor…done. 

Predictor:                        a.py
    Classifier Type:              Random Forest
    System Type:                  Binary classifier
    Training / Validation Split:  90% : 10%
    Accuracy:
      Best-guess accuracy:        61.50%
      Training accuracy:         100.00% (719/719 correct)
      Validation Accuracy:        85.18% (69/81 correct)
      Combined Model Accuracy:    98.50% (788/800 correct)

    Model Capacity (MEC):         13    bits

    Generalization Ratio:         53.18 bits/bit
    Percent of Data Memorized:     3.82%
    Resilience to Noise:          -1.74 dB

    Training Confusion Matrix:
              Actual | Predicted
              —— | ———
                died |  442    0 
            survived |    0  277 

    Validation Confusion Matrix:
              Actual | Predicted
              —— | ———
                died |   48    2 
            survived |   10   21 

    Training Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            ——– | —- —- —- —- ——– ——– ——– ——– ——– ——–
                died |  442    0  277    0  100.00%  100.00%  100.00%  100.00%  100.00%  100.00%
            survived |  277    0  442    0  100.00%  100.00%  100.00%  100.00%  100.00%  100.00%

    Validation Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            ——– | —- —- —- —- ——– ——– ——– ——– ——– ——–
                died |   48   10   21    2   96.00%   67.74%   82.76%   91.30%   88.89%   80.00%
            survived |   21    2   48   10   67.74%   96.00%   91.30%   82.76%   77.78%   63.64%

    Attribute Ranking:
                                      Feature | Relative Importance
                                          Sex :   0.4086
                                  Cabin_Class :   0.2060
                                 Cabin_Number :   0.0640
                                          Age :   0.0489
                                         Fare :   0.0468
                              Parent_Children :   0.0464
                               Sibling_Spouse :   0.0440
                                  PassengerId :   0.0386
                                Ticket_Number :   0.0381
                                         Name :   0.0328
                          Port_of_Embarkation :   0.0258

To get to the implicit biases induced by the data, Brainome needs to be invoked with the parameter -biasmeter. Brainome will then synthesize new samples from the data, and, taking into account the average generalization, measure how uniformly distributed random samples would be classified by the generated model. Of course, using this method, if the model was bias-free, uniform random input would generate uniform random output. No model is ever bias-free and so the bias towards a class is expressed as percentage. See below:

Brainome titanic_train.csv -e 5 -split 90 -biasmeter

Brainome Table Compiler v1.004-165-prod
Copyright (c) 2019-2021 Brainome, Inc. All Rights Reserved.

Command:
    btc titanic_train.csv -e 5 -split 90 -biasmeter

. . . 

    Attribute Ranking:
                                      Feature | Relative Importance
                                          Sex :   0.4492
                                  Cabin_Class :   0.1751
                                 Cabin_Number :   0.0946
                              Parent_Children :   0.0551
                               Sibling_Spouse :   0.0463
                                          Age :   0.0447
                                Ticket_Number :   0.0338
                                         Fare :   0.0324
                                  PassengerId :   0.0323
                                         Name :   0.0256
                          Port_of_Embarkation :   0.0109         

Measuring bias…done. 

Model bias:    1.18% towards class died away from class survived

The bias measured above indicates a pretty well-balanced model.

Note that Brainome’s bias meter is an approximation and cannot be taken for absolute truth as measuring bias is inherently difficult. One should be as suspicious of models that are bias-free as with models that have a large amount of bias.  Model bias can be reduced by making sure classes are balanced, sample size is high, and aiming for validation and training accuracy to be about equal. However, if there is inherent bias in the training data, then this bias should be part of the model. If the bias is unwanted, the training data or the process generating it needs to be corrected.