深度学习网格搜索的超参数优化逐步实例
本文将利用机器学习研究kaggle pima indians diabetes(https://www.kaggle.com/uciml/pima-indians-diabetes-database)数据集。
首先,导入Python包:
import pandas as pd import numpy as np import keras
读取机器学习数据集
df = pd.read_csv(‘/kaggle/input/pima-indians-diabetes-database/diabetes.csv’)
查看dataframe:
df.shape
大小为(768,9),表示有768个样本,列数为9。
df.columns
列为: [‘Pregnancies’, ‘Glucose’, ‘BloodPressure’, ‘SkinThickness’, ‘Insulin’, ‘BMI’, ‘DiabetesPedigreeFunction’, ‘Age’, ‘Outcome’]。
df.describe()
所有列的计数均为768,这表明没有缺失值。“Outcome”的平均值为0.35,这表明在给定的数据集中,“Outcome” = 0的情况多于“Outcome” = 1的情况。
将dataframe 'df'转换为numpy数组'dataset'
dataset = df.values
“dataset”拆分为X和y
X = dataset[:,0:8] y = dataset[:,8].astype(‘int’)
标准化
可以看到,列的平均值有很大的不同。因此,将机器学习数据集标准化,这样任何特征都不会被赋予不适当的权重。
a = StandardScaler() a.fit(X) X_standardized = a.transform(X)
现在让我们看一下“ X_standardized”的均值和标准差。
pd.DataFrame(X_standardized).describe()
所有列的均值在0左右,所有列的标准差在1左右。数据已经标准化。
超参数的调整:Batch Size 和Epochs
from sklearn.model_selection import GridSearchCV, KFold from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasClassifier from keras.optimizers import Adam
定义神经体系结构和优化算法。神经网络由1个输入层,2个具有Relu激活函数的隐藏层和1个具有sigmoid激活函数的输出层组成。选择Adam作为神经网络模型的优化算法。
我们运行网格搜索两个超参数:' batch_size '和' epochs '。使用的交叉验证技术是k - fold,默认值k = 3。将计算准确性得分。
# Create the model model = KerasClassifier(build_fn = create_model,verbose = 0) # Define the grid search parameters batch_size = [10,20,40] epochs = [10,50,100] # Make a dictionary of the grid search parameters param_grid = dict(batch_size = batch_size,epochs = epochs) # Build and fit the GridSearchCV grid = GridSearchCV(estimator = model,param_grid = param_grid,cv = KFold(),verbose = 10) grid_result = grid.fit(X_standardized,y)
打印出最佳精度分数和超参数的最佳值。
# Summarize the results print(‘Best : {}, using {}’.format(grid_result.best_score_,grid_result.best_params_)) means = grid_result.cv_results_[‘mean_test_score’] stds = grid_result.cv_results_[‘std_test_score’] params = grid_result.cv_results_[‘params’] for mean, stdev, param in zip(means, stds, params): print(‘{},{} with: {}’.format(mean, stdev, param))
对于'batch_size'= 40和'epochs'= 10,最佳准确度得分是0.7604。因此,在调整超参数时,我们选择'batch_size'= 40和'epochs'= 10。
超参数的调整:Learning rate 和 Drop out rate
学习率在优化算法中起着重要作用。如果学习率太大,该算法可能找不到局部最优值。如果学习率太小,则该算法可能需要进行很多次迭代才能收敛,从而导致较高的计算量和计算时间。因此,我们需要一个学习率的最优值,该值要小到足以使算法收敛,而又要足够大以加快收敛过程。学习率有助于“Early Stopping”,这是一种正则化方法,只要测试集的准确性不断提高,就可以对训练集进行训练。
Drop out是一种正则化方法,可以降低模型的复杂性,从而防止训练数据过度拟合。Drop out rate可以取0到1之间的值。0表示没有激活单元被淘汰,1表示所有激活单元都被淘汰了。
from keras.layers import Dropout # Defining the model def create_model(learning_rate,dropout_rate): model = Sequential() model.add(Dense(8,input_dim = 8,kernel_initializer = 'normal',activation = 'relu')) model.add(Dropout(dropout_rate)) model.add(Dense(4,input_dim = 8,kernel_initializer = 'normal',activation = 'relu')) model.add(Dropout(dropout_rate)) model.add(Dense(1,activation = 'sigmoid')) adam = Adam(lr = learning_rate) model.compile(loss = 'binary_crossentropy',optimizer = adam,metrics = ['accuracy']) return model # Create the model model = KerasClassifier(build_fn = create_model,verbose = 0,batch_size = 40,epochs = 10) # Define the grid search parameters learning_rate = [0.001,0.01,0.1] dropout_rate = [0.0,0.1,0.2] # Make a dictionary of the grid search parameters param_grids = dict(learning_rate = learning_rate,dropout_rate = dropout_rate) # Build and fit the GridSearchCV grid = GridSearchCV(estimator = model,param_grid = param_grids,cv = KFold(),verbose = 10) grid_result = grid.fit(X_standardized,y) # Summarize the results print('Best : {}, using {}'.format(grid_result.best_score_,grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print('{},{} with: {}'.format(mean, stdev, param))
对于'dropout_rate'= 0.1和'learning_rate'= 0.001,最佳准确性得分是0.7695。因此,在调整其他超参数时,我们选择'dropout_rate'= 0.1和'learning_rate'= 0.001。
超参数调整:-激活函数和核初始化器
激活函数将非线性特性引入神经网络,从而建立输入与输出之间的非线性复杂函数映射。如果我们不应用激活函数,那么输出将是输入的一个简单线性函数。
神经网络需要从一些权重开始,然后迭代地将其更新为更好的值。内核初始化器决定用于初始化权重的统计分布或函数。
# Defining the model def create_model(activation_function,init): model = Sequential() model.add(Dense(8,input_dim = 8,kernel_initializer = init,activation = activation_function)) model.add(Dropout(0.1)) model.add(Dense(4,input_dim = 8,kernel_initializer = init,activation = activation_function)) model.add(Dropout(0.1)) model.add(Dense(1,activation = 'sigmoid')) adam = Adam(lr = 0.001) model.compile(loss = 'binary_crossentropy',optimizer = adam,metrics = ['accuracy']) return model # Create the model model = KerasClassifier(build_fn = create_model,verbose = 0,batch_size = 40,epochs = 10) # Define the grid search parameters activation_function = ['softmax','relu','tanh','linear'] init = ['uniform','normal','zero'] # Make a dictionary of the grid search parameters param_grids = dict(activation_function = activation_function,init = init) # Build and fit the GridSearchCV grid = GridSearchCV(estimator = model,param_grid = param_grids,cv = KFold(),verbose = 10) grid_result = grid.fit(X_standardized,y) # Summarize the results print('Best : {}, using {}'.format(grid_result.best_score_,grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print('{},{} with: {}'.format(mean, stdev, param))
对于“ activation_function” = tanh和“ kernel_initializer” =uniform,最佳准确性得分是0.7591。因此,在调整其他超参数时,我们选择“ activation_function” = tanh和“ kernel_initializer” =“ uniform”。
超参数的调整:-激活层中神经元的数量
数据的复杂性必须与模型的复杂性相匹配。激活层中神经元的数量决定了模型的复杂性。激活层神经元数目越多,输入与输出之间的非线性复杂函数映射程度越高。
# Defining the model def create_model(neuron1,neuron2): model = Sequential() model.add(Dense(neuron1,input_dim = 8,kernel_initializer = 'uniform',activation = 'tanh')) model.add(Dropout(0.1)) model.add(Dense(neuron2,input_dim = neuron1,kernel_initializer = 'uniform',activation = 'tanh')) model.add(Dropout(0.1)) model.add(Dense(1,activation = 'sigmoid')) adam = Adam(lr = 0.001) model.compile(loss = 'binary_crossentropy',optimizer = adam,metrics = ['accuracy']) return model # Create the model model = KerasClassifier(build_fn = create_model,verbose = 0,batch_size = 40,epochs = 10) # Define the grid search parameters neuron1 = [4,8,16] neuron2 = [2,4,8] # Make a dictionary of the grid search parameters param_grids = dict(neuron1 = neuron1,neuron2 = neuron2) # Build and fit the GridSearchCV grid = GridSearchCV(estimator = model,param_grid = param_grids,cv = KFold(),verbose = 10) grid_result = grid.fit(X_standardized,y) # Summarize the results print('Best : {}, using {}'.format(grid_result.best_score_,grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print('{},{} with: {}'.format(mean, stdev, param))
对于第一层中的神经元数量= 16,第二层中的神经元数量= 4,最佳准确性得分是0.7591。
超参数的最佳值如下:
Batch size = 40
Epochs = 10
Dropout rate = 0.1
Learning rate = 0.001
Activation function = tanh
Kernel Initializer = uniform
No. of neurons in layer 1 = 16
No. of neurons in layer 2 = 4
具有超参数最佳值的训练模型
使用上一节中找到的超参数的最佳值来训练深度学习模型。
from sklearn.metrics import classification_report, accuracy_score # Defining the model def create_model(): model = Sequential() model.add(Dense(16,input_dim = 8,kernel_initializer = 'uniform',activation = 'tanh')) model.add(Dropout(0.1)) model.add(Dense(4,input_dim = 16,kernel_initializer = 'uniform',activation = 'tanh')) model.add(Dropout(0.1)) model.add(Dense(1,activation = 'sigmoid')) adam = Adam(lr = 0.001) model.compile(loss = 'binary_crossentropy',optimizer = adam,metrics = ['accuracy']) return model # Create the model model = KerasClassifier(build_fn = create_model,verbose = 0,batch_size = 40,epochs = 10) # Fitting the model model.fit(X_standardized,y) # Predicting using trained model y_predict = model.predict(X_standardized) # Printing the metrics print(accuracy_score(y,y_predict)) print(classification_report(y,y_predict))
准确度为77.6%,F1分数为0.84和0.65。
通过下面的Python代码片段一次性找到超参数的最优值,可以进一步提高性能。注意:-此过程的计算量很大。
def create_model(learning_rate,dropout_rate,activation_function,init,neuron1,neuron2): model = Sequential() model.add(Dense(neuron1,input_dim = 8,kernel_initializer = init,activation = activation_function)) model.add(Dropout(dropout_rate)) model.add(Dense(neuron2,input_dim = neuron1,kernel_initializer = init,activation = activation_function)) model.add(Dropout(dropout_rate)) model.add(Dense(1,activation = 'sigmoid')) adam = Adam(lr = learning_rate) model.compile(loss = 'binary_crossentropy',optimizer = adam,metrics = ['accuracy']) return model # Create the model model = KerasClassifier(build_fn = create_model,verbose = 0) # Define the grid search parameters batch_size = [10,20,40] epochs = [10,50,100] learning_rate = [0.001,0.01,0.1] dropout_rate = [0.0,0.1,0.2] activation_function = ['softmax','relu','tanh','linear'] init = ['uniform','normal','zero'] neuron1 = [4,8,16] neuron2 = [2,4,8] # Make a dictionary of the grid search parameters param_grids = dict(batch_size = batch_size,epochs = epochs,learning_rate = learning_rate,dropout_rate = dropout_rate, activation_function = activation_function,init = init,neuron1 = neuron1,neuron2 = neuron2) # Build and fit the GridSearchCV grid = GridSearchCV(estimator = model,param_grid = param_grids,cv = KFold(),verbose = 10) grid_result = grid.fit(X_standardized,y) # Summarize the results print('Best : {}, using {}'.format(grid_result.best_score_,grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print('{},{} with: {}'.format(mean, stdev, param))