Model selection Module (API Reference)¶
Main Cross-validation class for adding more estimators and get score using CV.
This will contain some model selection logic should be used here like Grid search logic, as here what I could do is create the search space. So maybe Grid search to find whole models.
Classifier real training happens here.
@author: Guangqiang.lu
- class automl.model_selection.GridSearchModel(backend, n_best_model=None, use_neural_network=True, task_type='classification')¶
Bases:
objectHere could just to implement that could add a list of estimators and their parameters list. I want to make this class to do real training part.
self.estimator_list is like: [GridSearchCV(lr, params), …] self.score_dict is like: {‘LogisticRegressin’: (lr, 0.9877)}
- add_estimator(estimator, estimator_params=None)¶
As I also want to keep current logic with a list of estimators in parallel, so here should keep whole models. As I will my own estimator, so don’t need always need estimator_params, but just add in case we just want to add other sklearn estimator :param estimator: estimator object :param estimator_params: native sklearn object params to search. :return:
- add_estimators(estimator_param_pairs)¶
This is try to parallel training for whole training with different params. :param estimator_param_pairs: a list of estimator and params
just like this:[(lr, {“C”:[1, 2]})]
- Returns
- fit(x, y, n_jobs=None)¶
Fit and Cross-validation for each estimator with scoring supported!
After CV and re-fit with full data, also the training_score and testing_score is based on CV result. Support with processing information by tqdm.
- By defualt:
classification scoring is accuracy regression scoring is mean_squared_error
- Parameters
x – training data
y – training label
n_jobs – how much cores to use
- Returns
- load_best_model_list(model_extension='pkl')¶
Load previous saved best model into a list of trained instance. :return:
- load_bestest_model()¶
load best trained model from disk :return:
- predict(x)¶
Get prediction based on best fitted model :param x: :return:
- predict_proba(x)¶
Get probability of based on best estimator :param x: :return:
- print_estimators()¶
To list whole grid models instances, so that we could check. :return:
- save_best_model_list()¶
save whole best fitted model based on each algorithm own parameters, so that we could save each best models. Then we could do ensemble logic.
Here I think I could just save the each best parameters trained model into disk, also the file name should be like LogisticRegression_9813.pkl: with classname_score.pkl.
Noted: This func should only be called after trained :param n_best_model: How many best model to save :return:
- save_bestest_model()¶
dump best trained model into disk, one more thing here: We shouldn’t save the model into disk with fixed name, but should with score. :return:
- save_trained_estimator(estimator, estimator_name)¶
To save the trained model into disk with file name :param estimator: :param estimator_name: :return:
- score(x, y)¶
To use best fitted model to evaluate test data :param x: :param y: :return: