300 times faster than Amazon, Google and Microsoft: Brainome fundamentally changes the game for Auto ML

Auto ML (AML) enables no-code predictive analysis for datasets of all sizes. Existing AML approaches available from Amazon, Microsoft and Google use brute-force (lots of compute resources) to try as many combinations of models + hyperparameters as the user is willing to pay for. This works, but it takes too much time, even for small datasets, and gets expensive if you have to generate lots of models.

At Brainome, we have an entirely different approach to the AML task. We measure the learnability of datasets and use these measurements to drive a reproducible, zero-code solution that (1) accelerates model generation up to 40 times, (2) predicts up to 300 times faster and (3) creates extremely compact models while achieving equivalent test accuracy and F1 when compared to Amazon, Microsoft and Google.

While our approach works well for datasets of all sizes, it is a revelation for customers who have very large datasets and / or need to generate many models. Think genomic analysis, disease discovery and / or financial markets. Analysis that previously took too long or was too expensive can now be done very cost efficiently in seconds or minutes.

This article describes the methodology used for comparing Brainome against Amazon SageMaker, Microsoft Azure Machine Learning and Google Cloud AutoML. We report the performance results and outline the differences between the various platforms.

Methodology

We selected from OpenML. They are a representative subset of the 100 binary classification datasets originally selected by Capital One and the University of Illinois Urbana-Champaign in their 2019 paper

(“Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools“). This paper attempted to compare the various AutoML platforms available at the time.

Each dataset was split into a training dataset (70%) used exclusively for training of the model and a held back test dataset (30%), used to compute the accuracy of the model.

Our benchmark recorded five key performance metrics: test accuracy, F1 score, training speed, prediction (or inference) speed and model size.

The four AutoML systems benchmarked are:

There are no “version” numbers for GCML, SageMaker and AzureML but all tests were conducted between December 2021 and January 2022.

For consistency, all tests were automated via scripting using the respective API of each system.
We can provide the scripts if you’d like to repeat the experiment – email us at contact@brainome.ai

Brainome, SageMaker and AzureML were run on similar hardware platforms equivalent to an EC2 m5.2xlarge general purpose instance (the suggested default for SageMaker). We did not have control of GCML’s hardware.

Results

The following table is a summary of the five key metrics used for comparison:

Metric

Brainome

Amazon

Microsoft

Google*

Average Test Accuracy

91.14%

91.15%

91.58%

91.33%

Average F1

0.83

0.86

0.83 Average training time (seconds)

108 2,453

3,247

4,410

Aggregate training time (seconds)

2,262

51,503

68,195

79,389

Average prediction time (seconds)

1.13

371

13.8

268 Aggregate prediction time (seconds)

23.8 7,793

289 4,818

Average predictor size (KB)

172 n/a

* Google could not produce results for 3 of the datasets

These two graphs show the accuracy and F1 score for each dataset for each system:

As we can see from the graphs, all four systems create models that perform very similarly in terms of accuracy and F1 score.

However there is a significant difference in training time and prediction time. Brainome builds and trains models 22 to 40 times faster than its competitors. Additionally, the very compact models produced by Brainome predict (infer) 12 to 300 times faster.

Observations and Limitations

Dataset Selection

GCML requires at least in any dataset used for training, while SageMaker requires at least

Because our methodology used 70% of each dataset for training and 30% as the test set, this meant that GCML could not produce results for 3 datasets which had less than 1,428 total data points.

There were also limitations on the number of features (columns) a dataset could have:

GCML rejects any dataset that has more than 1,000 features
SageMaker rejects any dataset that has more than 100,000 characters in a single line. This error message was displayed when trying to upload a data set with 72K features: “ClientError: One or more lines in the data exceeded the max allowed line length of 100000. Please reduce the length of the input and retry.”

Brainome has no such restrictions and can handle datasets of any size and shape (e.g., datasets like Iris that have very few data points and / or very wide genomic datasets that have a large number of features (100,000+)).

Preprocessing

We used Brainome to clean the raw data files downloaded from OpenML to convert all non-numeric data into numbers and fill in any missing values before submitting to all 4 AutoML systems. This was necessary to ensure that all platforms were using the same exact starting point for training.

Prediction Speed

Brainome has the fastest inference times (by orders of magnitude) thanks to its very compact, bespoke models. Both SageMaker and GCML are very slow to infer. This appears to be the result of combining model deployment with the actual inference. We were not able to separate these two steps for SageMaker and GCML. AzureML appears to keep the deployed model alive and the measured time is clearly just the inference time.

Model Size and Types

It was not possible to extract accurate model sizes and model types from GCML, SageMaker or AzureML. Brainome’s average model size is 172 Kilobytes. The smallest model was 25 Kilobytes and the largest one was 1,561 Kilobytes. The models (aka “predictors”) are entirely self-contained within a single Python file (a.py). It is worth noting that the a.py file contains a fair amount of code beyond just the inference model, including:

comments to make the a.py human readable
all data pre-processing (so raw data can be passed directly to the a.py for inference)
code that enables multi-core inference (if multi-core is available)
code to generate statistics, confusion matrices and other utilities

Brainome’s predictors can run on very small generic hardware, such as a standalone AWS Lambda or an IOT device, because they are typically less than 500 lines of Python code and only require the Numpy library.

Usability

It took ~6 engineering work days to set up the automation pipeline for each of the GCML, SageMaker and AzureML systems. Brainome’s Python PIP installation only takes a few minutes to set up locally or in a cloud environment and is ready for interactive use immediately. Integration with an automation pipeline takes ~2 hours.

Reproducibility

Brainome’s measurements driven approach ensures 100% reproducible model building, test accuracy and F1 scores. AzureML explicitly states in their that one should not expect reproducible results when using their platform. Neither SageMaker nor GCML say anything about reproducibility but running the same dataset multiple times on each platform (with the same exact parameters) yielded different test accuracy results each time.

Conclusion

These results demonstrate the advantages of Brainome’s unique measurements-based approach compared to standard AutoML systems.

To try Brainome with your own data, please follow our installation instructions. Additional documentation is found here. Note that when using Brainome your data will always stay private and never leave your computer. For large datasets, you can request a demo license key by emailing contact@brainome.ai

A PDF with the full results for all 21 data sets is available for email distribution:

300 times faster than Amazon, Google and Microsoft: Brainome fundamentally changes the game for Auto ML

Methodology

The four AutoML systems benchmarked are:

Results

Metric

Brainome

Amazon

Microsoft

Google*

Average Test Accuracy

91.14%

91.15%

91.58%

91.33%

Average F1

0.83

0.83

0.86

0.83

Average training time (seconds)

108

2,453

3,247

4,410

Aggregate training time (seconds)

2,262

51,503

68,195

79,389

Average prediction time (seconds)

1.13

371

13.8

268

Aggregate prediction time (seconds)

23.8

7,793

289

4,818

Average predictor size (KB)

172

n/a

n/a

n/a

* Google could not produce results for 3 of the datasets

Observations and Limitations

Dataset Selection

Preprocessing

Prediction Speed

Model Size and Types

Usability

Reproducibility

Conclusion

Product

Resources

Company

Legal

Share it