15. Hyperparameter Tuning

Scikit has many approaches to optimizing or tuning the hyperparameters of models. Let’s take a look at how we can use GridSearchCV to search over a space of possible hyperparamter combinations.

15.1. Create data

Let’s create a dummy binary classification dataset.

[1]:
import numpy as np
from sklearn.datasets import make_classification

np.random.seed(37)

X, y = make_classification(**{
    'n_samples': 2000,
    'n_features': 20,
    'n_informative': 2,
    'n_redundant': 2,
    'n_repeated': 0,
    'n_classes': 2,
    'n_clusters_per_class': 2,
    'random_state': 37
})

print(f'X shape = {X.shape}, y shape {y.shape}')
X shape = (2000, 20), y shape (2000,)

15.2. Tuning Logistic Regression

Let’s try to tune a logistic regression model. The logistic regression model will be referred to as the estimator; it is this estimator’s possible hyperparamters that we want to optimize. When tuning hyperparameters, we also need a way to split the data, and here, we will use StratifiedKFold. Another important input to the grid search is the param_grid argument, which is a dictionary specifying the search space of each hyperparameter. Here, our search space is simple, it is over the regularization strength. Lastly, we need an optimization criteria, and we specify that through the scoring argument.

[2]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, StratifiedKFold

p = {
    'solver': 'sag',
    'penalty': 'l2',
    'random_state': 37,
    'max_iter': 100
}
estimator = LogisticRegression(**p)

p = {
    'n_splits': 5,
    'shuffle': True,
    'random_state': 37
}
cv = StratifiedKFold(**p)

p = {
    'estimator': estimator,
    'cv': cv,
    'param_grid': {
        'C': [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
    },
    'scoring': {
        'auc': 'roc_auc',
        'apr': 'average_precision'
    },
    'verbose': 5,
    'refit': 'auc',
    'error_score': np.NaN,
    'n_jobs': -1
}
model = GridSearchCV(**p)

model.fit(X, y)
Fitting 5 folds for each of 11 candidates, totalling 55 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of  55 | elapsed:    1.0s remaining:   12.5s
[Parallel(n_jobs=-1)]: Done  16 out of  55 | elapsed:    1.0s remaining:    2.5s
[Parallel(n_jobs=-1)]: Done  28 out of  55 | elapsed:    1.1s remaining:    1.0s
[Parallel(n_jobs=-1)]: Done  40 out of  55 | elapsed:    1.1s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  52 out of  55 | elapsed:    1.1s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  55 out of  55 | elapsed:    1.2s finished
[2]:
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=37, shuffle=True),
             estimator=LogisticRegression(random_state=37, solver='sag'),
             n_jobs=-1,
             param_grid={'C': [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
                               0.9, 1.0]},
             refit='auc',
             scoring={'apr': 'average_precision', 'auc': 'roc_auc'}, verbose=5)

The best_params_ property gives the best combination of hyperparameters.

[3]:
model.best_params_
[3]:
{'C': 0.4}

The best_score_ property gives the best score.

[4]:
model.best_score_
[4]:
0.9644498503712592

To retrieve the best estimator induced by the search and scoring criteria, access best_estimator_.

[5]:
model.best_estimator_
[5]:
LogisticRegression(C=0.4, random_state=37, solver='sag')

15.3. Tuning Random Forest

Here, we tune a RandomForestClassifier.

[6]:
from sklearn.ensemble import RandomForestClassifier

p = {
    'random_state': 37
}
estimator = RandomForestClassifier(**p)

p = {
    'n_splits': 5,
    'shuffle': True,
    'random_state': 37
}
cv = StratifiedKFold(**p)

p = {
    'estimator': estimator,
    'cv': cv,
    'param_grid': {
        'n_estimators': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
        'criterion': ['gini', 'entropy']
    },
    'scoring': {
        'auc': 'roc_auc',
        'apr': 'average_precision'
    },
    'verbose': 5,
    'refit': 'auc',
    'error_score': np.NaN,
    'n_jobs': -1
}
model = GridSearchCV(**p)

model.fit(X, y)
Fitting 5 folds for each of 20 candidates, totalling 100 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done  58 out of 100 | elapsed:    0.6s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  79 out of 100 | elapsed:    0.9s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    1.2s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    1.2s finished
[6]:
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=37, shuffle=True),
             estimator=RandomForestClassifier(random_state=37), n_jobs=-1,
             param_grid={'criterion': ['gini', 'entropy'],
                         'n_estimators': [10, 20, 30, 40, 50, 60, 70, 80, 90,
                                          100]},
             refit='auc',
             scoring={'apr': 'average_precision', 'auc': 'roc_auc'}, verbose=5)
[7]:
model.best_params_
[7]:
{'criterion': 'entropy', 'n_estimators': 50}
[8]:
model.best_score_
[8]:
0.9763199132478311
[9]:
model.best_estimator_
[9]:
RandomForestClassifier(criterion='entropy', n_estimators=50, random_state=37)

15.4. Tuning with a pipeline

Our estimator can also be a pipeline. For each processor in the pipeline, we can also specify the parameter grid.

[10]:
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
pca = PCA()
rf = RandomForestClassifier(**{
    'random_state': 37
})
pipeline = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('rf', rf)])

cv = StratifiedKFold(**{
    'n_splits': 5,
    'shuffle': True,
    'random_state': 37
})

model = GridSearchCV(**{
    'estimator': pipeline,
    'cv': cv,
    'param_grid': {
        'scaler__feature_range': [(0, 1), (0, 2)],
        'pca__n_components': [2, 3, 4, 5, 10, 11, 12, 15],
        'rf__n_estimators': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
        'rf__criterion': ['gini', 'entropy']
    },
    'scoring': {
        'auc': 'roc_auc',
        'apr': 'average_precision'
    },
    'verbose': 5,
    'refit': 'auc',
    'error_score': np.NaN,
    'n_jobs': -1
})

model.fit(X, y)
Fitting 5 folds for each of 320 candidates, totalling 1600 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 384 tasks      | elapsed:    2.1s
[Parallel(n_jobs=-1)]: Done 708 tasks      | elapsed:    4.2s
[Parallel(n_jobs=-1)]: Done 1104 tasks      | elapsed:    7.4s
[Parallel(n_jobs=-1)]: Done 1600 out of 1600 | elapsed:   12.3s finished
[10]:
GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=37, shuffle=True),
             estimator=Pipeline(steps=[('scaler', MinMaxScaler()),
                                       ('pca', PCA()),
                                       ('rf',
                                        RandomForestClassifier(random_state=37))]),
             n_jobs=-1,
             param_grid={'pca__n_components': [2, 3, 4, 5, 10, 11, 12, 15],
                         'rf__criterion': ['gini', 'entropy'],
                         'rf__n_estimators': [10, 20, 30, 40, 50, 60, 70, 80,
                                              90, 100],
                         'scaler__feature_range': [(0, 1), (0, 2)]},
             refit='auc',
             scoring={'apr': 'average_precision', 'auc': 'roc_auc'}, verbose=5)
[11]:
model.best_params_
[11]:
{'pca__n_components': 3,
 'rf__criterion': 'entropy',
 'rf__n_estimators': 70,
 'scaler__feature_range': (0, 1)}
[12]:
model.best_score_
[12]:
0.9710898858096453
[13]:
model.best_estimator_
[13]:
Pipeline(steps=[('scaler', MinMaxScaler()), ('pca', PCA(n_components=3)),
                ('rf',
                 RandomForestClassifier(criterion='entropy', n_estimators=70,
                                        random_state=37))])

15.5. Validation with tuning

In some cases, you might want to validate the hyperparameter tuning as a part of your learning process. In this example, we show an example of how to so. Here are some things to note in this example.

  • The data generated will be multiclass.

  • We will implement custom scorers. The average precision score does not natively handle the multi-class label, and we will have to transform the ground truth lables into a one-hot encoded vector.

Now let’s generate some data.

[14]:
X, y = make_classification(**{
    'n_samples': 1000,
    'n_features': 10,
    'n_clusters_per_class': 1,
    'n_classes': 3,
    'random_state': 37
})

print(f'X shape = {X.shape}, y shape {y.shape}')
X shape = (1000, 10), y shape (1000,)

Below, we create a model that is a grid search based on random forest. Note how we use the make_scorer() method to create custom scorers.

[15]:
from sklearn.metrics import roc_auc_score, average_precision_score, make_scorer
from sklearn.preprocessing import OneHotEncoder

def apr_score(y_true, y_pred, average='micro'):
    encoder = OneHotEncoder()
    Y = encoder.fit_transform(y_true.reshape(-1, 1)).todense()

    return average_precision_score(Y, y_pred, average=average)

def get_model():
    scaler = MinMaxScaler()
    pca = PCA()
    rf = RandomForestClassifier(**{
        'random_state': 37
    })
    pipeline = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('rf', rf)])

    cv = StratifiedKFold(**{
        'n_splits': 5,
        'shuffle': True,
        'random_state': 37
    })

    auc_scorer = make_scorer(
        roc_auc_score,
        greater_is_better=True,
        needs_proba=True,
        multi_class='ovo')
    apr_scorer_macro = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='macro')
    apr_scorer_micro = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='micro')
    apr_scorer_weighted = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='weighted')

    model = GridSearchCV(**{
        'estimator': pipeline,
        'cv': cv,
        'param_grid': {
            'scaler__feature_range': [(0, 1), (0, 2)],
            'pca__n_components': [2, 3, 4, 5, 10, 11, 12, 15],
            'rf__n_estimators': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
            'rf__criterion': ['gini', 'entropy']
        },
        'scoring': {
            'auc': auc_scorer,
            'apr_scorer_macro': apr_scorer_macro,
            'apr_scorer_micro': apr_scorer_micro,
            'apr_scorer_weighted': apr_scorer_weighted
        },
        'verbose': 5,
        'refit': 'apr_scorer_micro',
        'error_score': np.NaN,
        'n_jobs': -1
    })
    return model

Now we can perform stratified, k-fold cross-validation while incorporating hyperparameter tuning as a part of the validation process.

[16]:
import warnings
import pandas as pd

warnings.filterwarnings('ignore')

results = []

for tr, te in StratifiedKFold(random_state=37, shuffle=True, n_splits=10).split(X, y):
    X_tr, X_te = X[tr], X[te]
    y_tr, y_te = y[tr], y[te]

    model = get_model()
    model.fit(X_tr, y_tr)

    y_pred = model.predict_proba(X_te)

    auc_ovr = roc_auc_score(y_te, y_pred, multi_class='ovr')
    auc_ovo = roc_auc_score(y_te, y_pred, multi_class='ovo')
    apr_macro = apr_score(y_te, y_pred, average='macro')
    apr_micro = apr_score(y_te, y_pred, average='micro')
    apr_weighted = apr_score(y_te, y_pred, average='weighted')

    results.append({
        'auc_ovr': auc_ovr,
        'auc_ovo': auc_ovo,
        'apr_macro': apr_macro,
        'apr_micro': apr_micro,
        'apr_weighted': apr_weighted
    })

rdf = pd.DataFrame(results)
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 1972 tasks      | elapsed:    2.0s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 32 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.1s
Fitting 5 folds for each of 448 candidates, totalling 2240 fits
[Parallel(n_jobs=-1)]: Done 132 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 704 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 2240 out of 2240 | elapsed:    2.0s finished
[17]:
rdf.mean()
[17]:
auc_ovr         0.998931
auc_ovo         0.998932
apr_macro       0.997529
apr_micro       0.997535
apr_weighted    0.997533
dtype: float64

15.6. tune-sklearn

tune-sklearn is a drop-in replacement for scikit-learn’s hyperparameter tuning. This API promises to find hyperpameters in a shorter amount of time and smarter way.

[18]:
from tune_sklearn import TuneGridSearchCV

def get_model():
    scaler = MinMaxScaler()
    pca = PCA()
    rf = RandomForestClassifier(**{
        'random_state': 37
    })
    pipeline = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('rf', rf)])

    cv = StratifiedKFold(**{
        'n_splits': 5,
        'shuffle': True,
        'random_state': 37
    })

    auc_scorer = make_scorer(
        roc_auc_score,
        greater_is_better=True,
        needs_proba=True,
        multi_class='ovo')
    apr_scorer_macro = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='macro')
    apr_scorer_micro = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='micro')
    apr_scorer_weighted = make_scorer(
        apr_score,
        greater_is_better=True,
        needs_proba=True,
        average='weighted')

    model = TuneGridSearchCV(**{
        'estimator': pipeline,
        'cv': cv,
        'param_grid': {
            'scaler__feature_range': [(0, 1)],
            'pca__n_components': [2, 3, 4, 5],
            'rf__criterion': ['gini', 'entropy']
        },
        'scoring': {
            'auc': auc_scorer,
            'apr_scorer_macro': apr_scorer_macro,
            'apr_scorer_micro': apr_scorer_micro,
            'apr_scorer_weighted': apr_scorer_weighted
        },
        'verbose': 1,
        'refit': 'apr_scorer_micro',
        'error_score': np.NaN,
        'n_jobs': -1,
        'early_stopping': 'MedianStoppingRule',
        'max_iters': 10
    })
    return model
[19]:
results = []

for tr, te in StratifiedKFold(random_state=37, shuffle=True, n_splits=5).split(X, y):
    X_tr, X_te = X[tr], X[te]
    y_tr, y_te = y[tr], y[te]

    model = get_model()
    model.fit(X_tr, y_tr)

    y_pred = model.predict_proba(X_te)

    auc_ovr = roc_auc_score(y_te, y_pred, multi_class='ovr')
    auc_ovo = roc_auc_score(y_te, y_pred, multi_class='ovo')
    apr_macro = apr_score(y_te, y_pred, average='macro')
    apr_micro = apr_score(y_te, y_pred, average='micro')
    apr_weighted = apr_score(y_te, y_pred, average='weighted')

    results.append({
        'auc_ovr': auc_ovr,
        'auc_ovo': auc_ovo,
        'apr_macro': apr_macro,
        'apr_micro': apr_micro,
        'apr_weighted': apr_weighted
    })

rdf = pd.DataFrame(results)
== Status ==
Memory usage on this node: 5.3/50.1 GiB
Using MedianStoppingRule: num_stopped=0.
Resources requested: 0/32 CPUs, 0/0 GPUs, 0.0/28.61 GiB heap, 0.0/9.86 GiB objects
Result logdir: /root/ray_results/_PipelineTrainable_2022-05-07_13-29-16
Number of trials: 8/8 (8 TERMINATED)

[20]:
rdf.mean()
[20]:
auc_ovr         0.998309
auc_ovo         0.998308
apr_macro       0.996224
apr_micro       0.996087
apr_weighted    0.996239
dtype: float64