Automl Module (API Reference)¶
Entry point for full automl training pipeline with support for classfication and regression.
Support with base machine learning models training and nueral network training, not only with just one model training, but also with ensemble to combine trained models into a more robust model to both reduce variance and bias.
High level steps:
Load training and testing data file or memory objects.
Feature engineering step to process data.
Model training based on processed data.
Nueral network model training based on processed data.
Ensemble logic to combine trained model and do comparation to see better or not.
Dump trained models into disk with user defined path.
author: Guangqiang.lu
- class automl.estimator.AutoML(models_path=None, time_left_for_this_task=3600, n_ensemble=10, n_best_model=5, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, keep_models=True, model_dir=None, precision=32, delete_models=True)¶
Bases:
sklearn.base.BaseEstimatorParent class for both classificatinon and regression auto training class.
this is to init automl class, whole thing should be ininstanted in this class, like what algorithms to use, how many models to be selected, etc.
- Parameters
backend – backend object used to save and load models
time_left_for_this_task – how long for this models to be trained.
n_ensemble – how many models to be selected to be ensemble
n_best_model – how many models to be keeped during training.
include_estimators – what algorithms to be included
exclude_estimators – what algorithms to be excluded
include_preprocessors – what preprocessing step to be included
exclude_preprocessors – what preprocessing step to be excluded
keep_models – whether or not to keep trained models
model_dir – keep model folder, if None use backend to create one folder
precision – precision of data, to save memory
- fit(x=None, y=None, file_load=None, xval=None, yval=None, val_split=0.2, n_jobs=None, use_neural_network=True, *args, **kwargs)¶
Main training entry point with support with file and memory objects.
Full training step with pre-processing pipeline and training pipeline happens here. Various type of data is supported and will convert them into a normal array for later training algorithms, will instant a training pipeline with different algorithms with hyper-parameters selected, will use grid-search to find best hyper-parameters, will store these trained models with validation score attached with algorithm name.
- Parameters
x ([array], optional) – [training data]. Defaults to None.
y ([array], optional) – [training label]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
xval ([array], optional) – [validation data]. Defaults to None.
yval ([array], optional) – [validation label]. Defaults to None.
val_split ([float], optional) – [percentage for validation if xval and yval not provdied]. Defaults to 0.2.
n_jobs ([int], optional) – [how many cores to be used]. Defaults to None.
use_neural_network (bool, optional) – [whether or not to use neural networks.]. Defaults to True.
- Returns
[trained object.]
- Return type
[self]
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- get_sorted_models_scores(xtest=None, ytest=None, file_load=None, reverse=True, **kwargs)¶
To get some best trained model’s score for test data with ordered.
So that we could get the list of the best scores for later front end show case. :param x: :param y: :param kwargs: :return:
- predict(x=None, file_load=None, **kwargs)¶
Based on data or file to get prediction based on best trained models.
- Parameters
x ([array], optional) – [test data]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
- Returns
[prediction]
- Return type
[array]
- predict_proba(x=None, file_load=None, **kwargs)¶
Probability supported based on best trained model.
- Parameters
x (array, optional) – test data. Defaults to None.
file_load (array, optional) – file_load object. Defaults to None.
- Raises
NotImplementedError – Raise error if not support with predict_proba
- Returns
probability of test data
- Return type
array
- classmethod reconstruct(models_path=None, *args, **kwargs)¶
Used for Restful API to create
- Parameters
models_path (str, optional) – Where trained model is. Defaults to None.
- Returns
a re-constructed object for API use case
- Return type
- score(x=None, y=None, file_load=None, **kwargs)¶
Get score based on test data and label.
Classifcation will use accuracy, regression will use r2-score
- Parameters
x (array, optional) – test data. Defaults to None.
y (array, optional) – test label. Defaults to None.
file_load (FileLoad, optional) – file_load to contain data and label. Defaults to None.
- Returns
evaluation score
- Return type
float
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class automl.estimator.ClassificationAutoML(models_path=None, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, **kwargs)¶
Bases:
automl.estimator.AutoMLAdded with algorithm selection and processing selection, even with others in case we need.
- Parameters
models_path (Str, optional) – Where to store our models. Defaults to None.
- fit(x=None, y=None, file_load=None, xval=None, yval=None, val_split=0.2, n_jobs=None, use_neural_network=True, *args, **kwargs)¶
Main training entry point with support with file and memory objects.
Full training step with pre-processing pipeline and training pipeline happens here. Various type of data is supported and will convert them into a normal array for later training algorithms, will instant a training pipeline with different algorithms with hyper-parameters selected, will use grid-search to find best hyper-parameters, will store these trained models with validation score attached with algorithm name.
- Parameters
x ([array], optional) – [training data]. Defaults to None.
y ([array], optional) – [training label]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
xval ([array], optional) – [validation data]. Defaults to None.
yval ([array], optional) – [validation label]. Defaults to None.
val_split ([float], optional) – [percentage for validation if xval and yval not provdied]. Defaults to 0.2.
n_jobs ([int], optional) – [how many cores to be used]. Defaults to None.
use_neural_network (bool, optional) – [whether or not to use neural networks.]. Defaults to True.
- Returns
[trained object.]
- Return type
[self]
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- get_sorted_models_scores(xtest=None, ytest=None, file_load=None, reverse=True, **kwargs)¶
To get some best trained model’s score for test data with ordered.
So that we could get the list of the best scores for later front end show case. :param x: :param y: :param kwargs: :return:
- predict(x=None, file_load=None, **kwargs)¶
Based on data or file to get prediction based on best trained models.
- Parameters
x ([array], optional) – [test data]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
- Returns
[prediction]
- Return type
[array]
- predict_proba(x=None, file_load=None, **kwargs)¶
Probability supported based on best trained model.
- Parameters
x (array, optional) – test data. Defaults to None.
file_load (array, optional) – file_load object. Defaults to None.
- Raises
NotImplementedError – Raise error if not support with predict_proba
- Returns
probability of test data
- Return type
array
- classmethod reconstruct(models_path=None, *args, **kwargs)¶
Used for Restful API to create
- Parameters
models_path (str, optional) – Where trained model is. Defaults to None.
- Returns
a re-constructed object for API use case
- Return type
- score(x=None, y=None, file_load=None, **kwargs)¶
Get score based on test data and label.
Classifcation will use accuracy, regression will use r2-score
- Parameters
x (array, optional) – test data. Defaults to None.
y (array, optional) – test label. Defaults to None.
file_load (FileLoad, optional) – file_load to contain data and label. Defaults to None.
- Returns
evaluation score
- Return type
float
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- class automl.estimator.FileLoad(file_name, file_path=None, file_sep=',', label_name='label', use_for_pred=False, service_account_file_name=None, service_account_file_path=None, except_columns=None)¶
Bases:
objectLoad data from file, support with local file also with GCS.
Make this class as a container for later use case.
Main container for file-like dataset.
- Parameters
file_name (str) – Name of file
label_name (str, optional) – What is label column’s name?. Defaults to ‘label’.
file_path (str, optional) – Where file located?. Defaults to None.
file_sep (str, optional) – File seprator. Defaults to ‘,’.
use_for_pred (Boolean, optional) – Whether to use this for prediction? Noted: If file doesn’t contain label column, do need set this parameter to True. Defaults to False.
service_account_file_name (str, optional) – SA file name. Defaults to None.
service_account_file_path (str, optional) – SA file path. Defaults to None.
except_columns (List, optional) – Columns are needed to be used. Defaults to None.
- Raises
ValueError – [description]
- class automl.estimator.RegressionAutoML(models_path=None, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, **kwargs)¶
Bases:
automl.estimator.AutoMLAdded with algorithm selection and processing selection, even with others in case we need.
- Parameters
models_path (Str, optional) – Where to store our models. Defaults to None.
- fit(x=None, y=None, file_load=None, xval=None, yval=None, val_split=0.2, n_jobs=None, use_neural_network=True, *args, **kwargs)¶
Main training entry point with support with file and memory objects.
Full training step with pre-processing pipeline and training pipeline happens here. Various type of data is supported and will convert them into a normal array for later training algorithms, will instant a training pipeline with different algorithms with hyper-parameters selected, will use grid-search to find best hyper-parameters, will store these trained models with validation score attached with algorithm name.
- Parameters
x ([array], optional) – [training data]. Defaults to None.
y ([array], optional) – [training label]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
xval ([array], optional) – [validation data]. Defaults to None.
yval ([array], optional) – [validation label]. Defaults to None.
val_split ([float], optional) – [percentage for validation if xval and yval not provdied]. Defaults to 0.2.
n_jobs ([int], optional) – [how many cores to be used]. Defaults to None.
use_neural_network (bool, optional) – [whether or not to use neural networks.]. Defaults to True.
- Returns
[trained object.]
- Return type
[self]
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- get_sorted_models_scores(xtest=None, ytest=None, file_load=None, reverse=True, **kwargs)¶
To get some best trained model’s score for test data with ordered.
So that we could get the list of the best scores for later front end show case. :param x: :param y: :param kwargs: :return:
- predict(x=None, file_load=None, **kwargs)¶
Based on data or file to get prediction based on best trained models.
- Parameters
x ([array], optional) – [test data]. Defaults to None.
file_load ([FileLoad], optional) – [file_load object to contain data and label]. Defaults to None.
- Returns
[prediction]
- Return type
[array]
- predict_proba(x=None, file_load=None, **kwargs)¶
Probability supported based on best trained model.
- Parameters
x (array, optional) – test data. Defaults to None.
file_load (array, optional) – file_load object. Defaults to None.
- Raises
NotImplementedError – Raise error if not support with predict_proba
- Returns
probability of test data
- Return type
array
- classmethod reconstruct(models_path=None, *args, **kwargs)¶
Used for Restful API to create
- Parameters
models_path (str, optional) – Where trained model is. Defaults to None.
- Returns
a re-constructed object for API use case
- Return type
- score(x=None, y=None, file_load=None, **kwargs)¶
Get score based on test data and label.
Classifcation will use accuracy, regression will use r2-score
- Parameters
x (array, optional) – test data. Defaults to None.
y (array, optional) – test label. Defaults to None.
file_load (FileLoad, optional) – file_load to contain data and label. Defaults to None.
- Returns
evaluation score
- Return type
float
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance