15. optuna
optuna is a highly advanced hyperparameter tuning framework that goes beyond Scikit-Learn’s grid and random search. Let’s take a look at how we can use optuna to tune a model.
15.1. Load data
We will use the diabetes data.
[1]:
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True, as_frame=True)
X.shape, y.shape
[1]:
((442, 10), (442,))
[2]:
from sklearn.model_selection import train_test_split
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.10, random_state=37)
X_tr.shape, X_te.shape, y_tr.shape, y_te.shape
[2]:
((397, 10), (45, 10), (397,), (45,))
15.2. Tuning
Now, the tuning begins. Optuna requires an objective
function that takes in a trial object and returns a scalar or tuple; when a tuple of scalar values is returned, the tuning is called multiobjective tuning. In this example, we have only one objective which is to minimize the mean absolute erorr MAE
.
[3]:
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import numpy as np
import optuna
np.random.seed(37)
optuna.logging.set_verbosity(optuna.logging.WARNING)
def get_model(imputer_params={}, regressor_params={}):
model = Pipeline([
('imputer', SimpleImputer(**imputer_params)),
('regressor', RandomForestRegressor(**regressor_params))
])
return model
def objective(trial):
i_params = {
'strategy': trial.suggest_categorical('strategy', ['mean', 'median', 'most_frequent'])
}
r_params = {
'n_estimators': trial.suggest_int('n_estimators', 100, 200),
'min_samples_split': trial.suggest_int('min_samples_split', 2, 10),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
'criterion': trial.suggest_categorical('criterion', ['squared_error', 'absolute_error', 'poisson']),
'max_features': trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2']),
'bootstrap': False,
'oob_score': False,
'warm_start': trial.suggest_categorical('warm_start', [True, False]),
'ccp_alpha': trial.suggest_float('ccp_alpha', 0, 1),
'max_depth': trial.suggest_int('max_depth', 1, 100),
'random_state': 37
}
model = get_model(i_params, r_params)
model.fit(X_tr, y_tr)
y_pred = model.predict(X_te)
mae = mean_absolute_error(y_te, y_pred)
rmse = mean_squared_error(y_te, y_pred, squared=False)
r2s = r2_score(y_te, y_pred)
trial.set_user_attr('mae', mae)
trial.set_user_attr('rmse', rmse)
trial.set_user_attr('r2s', r2s)
return mae
After we create an objective function, we can create a study and perform optimization.
[5]:
study = optuna.create_study(**{
'study_name': 'dummy-study',
'storage': 'sqlite:///_temp/dummy-study.db',
'load_if_exists': True,
'direction': 'minimize',
'sampler': optuna.samplers.TPESampler(seed=37),
'pruner': optuna.pruners.MedianPruner(n_warmup_steps=10)
})
study.optimize(**{
'func': objective,
'n_trials': 100,
'n_jobs': 1,
'show_progress_bar': False
})
Now we may look at the best hyperparameters, value (the value we are trying to optmize for), and trial.
[6]:
study.best_params
[6]:
{'ccp_alpha': 0.6016097921882426,
'criterion': 'absolute_error',
'max_depth': 6,
'max_features': 'auto',
'min_samples_leaf': 2,
'min_samples_split': 9,
'n_estimators': 193,
'strategy': 'most_frequent',
'warm_start': False}
[7]:
study.best_value
[7]:
40.65555555555556
[8]:
study.best_trial
[8]:
FrozenTrial(number=62, values=[40.65555555555556], datetime_start=datetime.datetime(2023, 6, 1, 22, 34, 59, 583419), datetime_complete=datetime.datetime(2023, 6, 1, 22, 35, 1, 525495), params={'ccp_alpha': 0.6016097921882426, 'criterion': 'absolute_error', 'max_depth': 6, 'max_features': 'auto', 'min_samples_leaf': 2, 'min_samples_split': 9, 'n_estimators': 193, 'strategy': 'most_frequent', 'warm_start': False}, distributions={'ccp_alpha': FloatDistribution(high=1.0, log=False, low=0.0, step=None), 'criterion': CategoricalDistribution(choices=('squared_error', 'absolute_error', 'poisson')), 'max_depth': IntDistribution(high=100, log=False, low=1, step=1), 'max_features': CategoricalDistribution(choices=('auto', 'sqrt', 'log2')), 'min_samples_leaf': IntDistribution(high=10, log=False, low=1, step=1), 'min_samples_split': IntDistribution(high=10, log=False, low=2, step=1), 'n_estimators': IntDistribution(high=200, log=False, low=100, step=1), 'strategy': CategoricalDistribution(choices=('mean', 'median', 'most_frequent')), 'warm_start': CategoricalDistribution(choices=(True, False))}, user_attrs={'mae': 40.65555555555556, 'r2s': 0.5859187159569563, 'rmse': 53.93324062620792}, system_attrs={}, intermediate_values={}, trial_id=63, state=TrialState.COMPLETE, value=None)
The trial outputs may be retrieved from the study’s trials_dataframe()
method.
[9]:
study.trials_dataframe().dtypes
[9]:
number int64
value float64
datetime_start datetime64[ns]
datetime_complete datetime64[ns]
duration timedelta64[ns]
params_ccp_alpha float64
params_criterion object
params_max_depth int64
params_max_features object
params_min_samples_leaf int64
params_min_samples_split int64
params_n_estimators int64
params_strategy object
params_warm_start bool
user_attrs_mae float64
user_attrs_r2s float64
user_attrs_rmse float64
state object
dtype: object
[10]:
study.trials_dataframe()[['number', 'params_criterion', 'params_min_samples_leaf', 'value', 'user_attrs_rmse', 'user_attrs_r2s']] \
.sort_values(['value', 'user_attrs_rmse', 'user_attrs_r2s']) \
.head()
[10]:
number | params_criterion | params_min_samples_leaf | value | user_attrs_rmse | user_attrs_r2s | |
---|---|---|---|---|---|---|
62 | 62 | absolute_error | 2 | 40.655556 | 53.933241 | 0.585919 |
92 | 92 | absolute_error | 1 | 40.655556 | 53.933241 | 0.585919 |
52 | 52 | absolute_error | 2 | 41.144444 | 54.060820 | 0.583957 |
53 | 53 | absolute_error | 2 | 41.144444 | 54.060820 | 0.583957 |
54 | 54 | absolute_error | 3 | 41.144444 | 54.060820 | 0.583957 |
We can continue the hyperparameter tuning at a later time.
[11]:
_study = optuna.create_study(**{
'study_name': 'dummy-study',
'storage': 'sqlite:///_temp/dummy-study.db',
'load_if_exists': True,
'direction': 'minimize',
'sampler': optuna.samplers.TPESampler(seed=37),
'pruner': optuna.pruners.MedianPruner(n_warmup_steps=10)
})
_study.optimize(**{
'func': objective,
'n_trials': 10,
'n_jobs': 5,
'show_progress_bar': False
})
15.3. Plotting
There are several plots you may use to understand the hyperparameter optmization results.
[13]:
from optuna.visualization import plot_optimization_history
plot_optimization_history(**{
'study': _study
})
[14]:
from optuna.visualization import plot_parallel_coordinate
plot_parallel_coordinate(**{
'study': _study
})
[15]:
from optuna.visualization import plot_param_importances
plot_param_importances(**{
'study': _study
})
[16]:
from optuna.visualization import plot_slice
plot_slice(**{
'study': _study,
'params': ['criterion', 'ccp_alpha', 'warm_start']
})
[17]:
from optuna.visualization import plot_contour
plot_contour(**{
'study': _study,
'params': ['criterion', 'min_samples_leaf', 'max_depth']
})
[18]:
from optuna.visualization import plot_edf
plot_edf(_study)