利用神经网络进行手写数字识别
“MNIST手写数字数据库可从http://yann.lecun.com/exdb/mnist/获得,其中包含60,000个示例的训练集,以及10,000个示例的测试集。”每个示例都是手写数字的28x28图像。
这个范例综合了先前所介绍的神经网络,随机梯度下降和反向传播。
首先,建立一个三层的神经网络,架构是第一层输入层(共有784个神经元),第二层隐含层(共有20个神经元),最后一层输出层(共有10个神经元),并且权重偏向状语从句赋予起始值,如下图。
# build the network
# w1/b1 w2/b2
#784(inputs) ---> 20 ---> 10(output)
# x z1 a1 z2 y
self.weight1 = np.random.normal(0, 1, [self.num_nodes_in_layers[0],
self.num_nodes_in_layers[1]])
self.bias1 = np.zeros((1, self.num_nodes_in_layers[1]))
self.weight2 = np.random.normal(0, 1, [self.num_nodes_in_layers[1],
self.num_nodes_in_layers[2]])
self.bias2 = np.zeros((1, self.num_nodes_in_layers[2]))
接下来,先将输入到神经网络中,算出z1,a1,z2,y。
z1 = np.dot(inputs_batch, self.weight1) + self.bias1
a1 = function.relu(z1)
z2 = np.dot(a1, self.weight2) + self.bias2
y = function.softmax(z2)
在使用gradient descent前,必須要先用backpropagation來算出weight和bias的gradient。
# backward pass
delta_y = (y - labels_batch) / y.shape[0]
delta_hidden_layer = np.dot(delta_y, self.weight2.T)
delta_hidden_layer[a1 <= 0] = 0 # derivatives of relu
# backpropagation
weight2_gradient = np.dot(a1.T, delta_y) # forward * backward
bias2_gradient = np.sum(delta_y, axis = 0, keepdims = True)
weight1_gradient = np.dot(inputs_batch.T, delta_hidden_layer)
bias1_gradient = np.sum(delta_hidden_layer, axis = 0, keepdims =
True)
有了权重和偏差的梯度之后就可以透过随机梯度下降来不断更新权重和偏差,直到网络的损失是可以接受的。
#stochastic gradient descent self.weight1 - = self.learning_rate * weight1_gradient
self.bias1 - = self.learning_rate * bias1_gradient
self.weight2 - = self.learning_rate * weight2_gradient
self.bias2 - = self.learning_rate * bias2_gradient
运行如下:
...
=== Epoch: 5/5 Iteration:59992 Loss: 0.55 ===
=== Epoch: 5/5 Iteration:59993 Loss: 0.73 ===
=== Epoch: 5/5 Iteration:59994 Loss: 0.54 ===
=== Epoch: 5/5 Iteration:59995 Loss: 0.60 ===
=== Epoch: 5/5 Iteration:59996 Loss: 0.43 ===
=== Epoch: 5/5 Iteration:59997 Loss: 0.41 ===
=== Epoch: 5/5 Iteration:59998 Loss: 0.49 ===
=== Epoch: 5/5 Iteration:59999 Loss: 0.73 ===
=== Epoch: 5/5 Iteration:60000 Loss: 0.88 ===
Testing...
Test accuracy: 91.540%
经过5次Epoch,每个Epoch共60000次迭代,最后得到精度= 91.54%
损失的变化如下图,从150下降到0.6左右。
测试结果: