Access all resources and documentation or contact us for additional assistance.
Command Line Options
Brainome has many control options that can be used to get the exact training and model behavior you want when you’re building your predictor. This is the detailed list of building and training control options:
Description and Example
Displays a help message that lists all the command line options and a few examples.
Displays the revision number of the brainome compiler you are using.
> brainome -version
Here we are using the production version 1.006 release 19 of brainome. Version numbers are essential for reproducibility. The version number used to create predictors is also indicated in the comment section of the predictor.
Brainome expects a dataset with a header that describes each column. For instance, the titanic dataset has the following header:
When your dataset is missing a header and the first line is a row of data, use the -headerless option:
> brainome -headerless mydata_noheader.csv
Note: with a headerless data file, the individual columns are referred to by their positions, starting at 0. Hence, the first column is “0”, the second one is “1” and so forth.
By default, brainome uses the last column as the target of the classification problem.
If your target column is not the last column, you can specify the TARGET column name if your dataset has a header OR the TARGET column number for headerless datasets.
Specifying the target by column name in Titanic :
> brainome -target Survived titanic_train.csv
Specifying the target by column number after removing the header of Titanic (note that Survived is the 11th column when starting counting from 0)
> brainome -headerless -target 11 titanic_train_headerless.csv
Ignoring columns that contain unique identifiers or duplicates of the target columns can lead to better models.
When the dataset has a header, IGNORECOLS is the list of column names to ignore, separated by commas.
For headerless datasets, IGNORECOLS is the list of column numbers to ignore (starting a 0), separated by commas.
For instance, ignoring the passenger ID and name columns in Titanic is done as follow:
> brainome -ignorecolumns PassengerId,Name titanic_train.csv
Ignoring the same columns after removing the header of Titanic:
> brainome -headerless -ignorecolumns 0,2 titanic_train_headerless.csv
Before training, the -rank option selects the most useful attributes (columns) in your training data to build a model. This is typically used with data sets that have many columns.
By default, it only pulls the columns that matter the most, but you can override the number of columns to use by specifying an optional RANKN.
Doing a pre-training on Titanic looks like this:
> brainome titanic_train.csv -rank
Forcing brainome to use the 5 most important features:
> brainome titanic_train.csv -rank 5
This option will only produce the pre-training measurements. This is useful when doing feature engineering to quickly assess the importance of new and convoluted features on the MEC and the capacity progression without having to go through the entire model building/training process.
You can override brainome’s automatic model type selection and force the creation of a specific type of machine learning model using the -f option. FORCEMODEL can be one of DT, RF or NN.
To create a Decision Tree model:
> brainome -f DT mydata.csv
To create a Random Forest model:
> brainome -f RF mydata.csv
To create a Neural Network Model:
> brainome -f NN mydata.csv
When you have separate training and validation datasets, use this option to prevent brainome from splitting the training data to automatically create a validation set.
brainome automatically splits your dataset into a training and validation dataset based on measurements. You can override the automatic split by using this option and specifying the percentage of datapoint AMOUNT (in percent) to use for training.
For instance, you can force the training for Titanic to use 80% of the dataset as follow:
> brainome -split 80 titanic_train.csv
Note that a -split 100 is equivalent to -nosplit
When a dataset has a lot of samples and only a percentage of the data is required for training, the -nsamples option allows you to select COUNT rows of data for training. Note that no attempt is made to keep the classes balanced as the samples are selected randomly.
This example selects 10,000 data points for training from a large dataset:
> brainome -nsamples 10000 bigdatafile.csv
This option is used to automatically remove data points whose target class belongs to any of the classes specified in CLASSLIST. This is often used to remove classes with very few instances.
If you have a dataset whose target output has classes Red, Blue, Green and Yellow,
> brainome -ignoreclasses Red,Blue colors.csv
will remove any datapoint whose output is Red or Blue and build a binary predictor for the classes Green and Yellow.
You can use this option to build a predictor using ONLY the list of columns specified in COLUMNLIST.
For instance, you can build a model for Titanic using only the gender, the cabin class and the number of children as follow:
> brainome -usecolumns Sex,Cabin_Class,Parent_Children titanic_train.csv
Using the same columns after removing the header of Titanic:
> brainome -headerless -usecolumns 1,3,6 titanic_train_headerless.csv
The default output predictor filename is a.py. Using the -o option, you can specify a different predictor filename OUTPUT.
Specify an output filename “titanic.py” for the Titanic training dataset:
> brainome -o titanic.py titanic_train.csv
The verbose option displays additional information during the measurement, model building and training phase of brainome that might prove useful to the user:
> brainome -v titanic_train.csv
The quiet option forces brainome to run without displaying any information. This is useful in conjunction with the -y option to automate model building.
> brainome -q titanic_train.csv
The -y option lets the user automatically answer Yes to all interactive questions. This is useful to automate model building:
> brainome -y titanic_train.csv
will automatically overwrite any existing a.py output.
When automating model building and training with brainome, it is useful to output all the measurement and training results to a JSON file for automated processing. JSONFNNAME is the JSON filename.
For instance, using the Titanic dataset, you can quietly build an RF model a.py and place all the measurement and training results in the file titanic.json:
> brainome -y titanic_train.csv -json titanic.json -q -f RF
Specifying an EFFORT value will increase the computation time that brainome will spend training the model. The default value is 1. For Neural Nets, an effort of 2 to 20 can improve the accuracy. For Random Forest models, an effort of 2 to 100 can help. It has not effect on Decision Trees.
The EFFORT is also ignored when 100% of the dataset is used for training (-nosplit or -split 100)
Using this option will skip the validation process after building a predictor. As a result, statistics on the training data and validation data will be omitted.
The nofun option forces brainome to stop if any warning is encountered. The compiler will not produce a predictor.
This option limits the measurements done by brainome to the ones required for building the model selected. This can speed up the overall compilation process when measurements are not required.