深夜赶工:CNN神经网络做彩色图像识别,用以测试天价核弹
在图像识别的道路越走越远✌( •̀ ω •́ )y
1.解释一下
深夜脑子不是很清楚,大部分代码参考了github……
此CNN图像识别神经网络的用途是之后用来评估NVIDIA-DGX服务器的性能,因此尽量扩大网络的训练时间。
此服务器搭载了8块NVIDIA TESLA V100显卡,是目前顶级的深度学习计算卡,单卡售价102万RMB,整机售价接近1000万,天价核弹,有钱真好。根据网上的信息,此服务器可在8小时内完成titanX 8天的工作量,顶级民用cpu数个月工作量。
此神经网络参考了GITHUB的图像识别项目,采用了DenseNet模型,增加了ImageDataGenerator函数以扩充数据集。打算后续通过改变常量epoch的值在各个平台进行运算。
由于深夜仓促,尚未完成GPU的配置,因此把epoch设置为1先在CPU上跑跑试试,通过经验估计在GTX1080上所需的时间。
2.数据集说明
该训练采用cifar10数据集,包含60000张32x32像素的彩色图片,这些图片分属不同的类别,如图所示:
具体说明参考多伦多大学官网:http://www.cs.toronto.edu/~kr...
此网络的目的是尽量精确地通过图像识别将图片分类到自己所属类别当中。
下载数据集后直接改名后放入user.kerasdatasets文件夹中:
解压后可发现,数据集分成6个batch,其中5个为训练集,1个为测试集:
3.深夜仓促,直接上代码:
导入第三方库(numpy/keras/math):
import numpy as np import keras import math from keras.datasets import cifar10 from keras.preprocessing.image import ImageDataGenerator from keras.layers.normalization import BatchNormalization from keras.layers import Conv2D, Dense, Input, add, Activation, AveragePooling2D, GlobalAveragePooling2D from keras.layers import Lambda, concatenate from keras.initializers import he_normal from keras.layers.merge import Concatenate from keras.callbacks import LearningRateScheduler, TensorBoard, ModelCheckpoint from keras.models import Model from keras import optimizers from keras import regularizers from keras.utils.vis_utils import plot_model as plot
设置常量:
growth_rate = 12 depth = 100 compression = 0.5 img_rows, img_cols = 32, 32 #图片尺寸 img_channels = 3 #图片色彩通道数,RGB num_classes = 10 #数据集类别数量 batch_size = 64 #训练batch所包含的example数量,只能是64或者32 epochs = 1 #全数据集迭代次数,这里打算用cpu运算一次。 #根据测试的显卡和自己的要求改epoch数量 #当epoch数量为250时识别效果较好,但这里不考虑效果 iterations = 782 #每一次epoch的步数 weight_decay = 0.0001 mean = [125.307, 122.95, 113.865] std = [62.9932, 62.0887, 66.7048]
根迭代次数改变scheduler,越迭代到后面该值越小,这意味着希望训练过程中随机因素逐步减小:
def scheduler(epoch): if epoch <= 100: return 0.1 if epoch <= 180: return 0.01 return 0.0005
定义一个DenseNet模型(github搬运工上线!):
def densenet(img_input,classes_num): def bn_relu(x): x = BatchNormalization()(x) x = Activation('relu')(x) return x def bottleneck(x): channels = growth_rate * 4 x = bn_relu(x) x = Conv2D(channels,kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x) x = bn_relu(x) x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x) return x def single(x): x = bn_relu(x) x = Conv2D(growth_rate,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x) return x def transition(x, inchannels): x = bn_relu(x) x = Conv2D(int(inchannels * compression),kernel_size=(1,1),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(x) x = AveragePooling2D((2,2), strides=(2, 2))(x) return x def dense_block(x,blocks,nchannels): concat = x for i in range(blocks): x = bottleneck(concat) concat = concatenate([x,concat], axis=-1) nchannels += growth_rate return concat, nchannels def dense_layer(x): return Dense(classes_num,activation='softmax',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay))(x) # nblocks = (depth - 4) // 3 nblocks = (depth - 4) // 6 nchannels = growth_rate * 2 x = Conv2D(nchannels,kernel_size=(3,3),strides=(1,1),padding='same',kernel_initializer=he_normal(),kernel_regularizer=regularizers.l2(weight_decay),use_bias=False)(img_input) x, nchannels = dense_block(x,nblocks,nchannels) x = transition(x,nchannels) x, nchannels = dense_block(x,nblocks,nchannels) x = transition(x,nchannels) x, nchannels = dense_block(x,nblocks,nchannels) x = bn_relu(x) x = GlobalAveragePooling2D()(x) x = dense_layer(x) return x
载入数据集,并对标签进行矩阵设置,改变数据集数据类型:
(x_train, y_train), (x_test, y_test) = cifar10.load_data() y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) x_train = x_train.astype('float32') x_test = x_test.astype('float32')
将数据集归一化,方便训练:
for i in range(3): x_train[:,:,:,i] = (x_train[:,:,:,i] - mean[i]) / std[i] x_test[:,:,:,i] = (x_test[:,:,:,i] - mean[i]) / std[i]
定义模型并打印简图,shell中打印的模型图太长了,就不贴了,长得一逼,需要看的话直接在shell中print summary就可以:
img_input = Input(shape=(img_rows,img_cols,img_channels)) output = densenet(img_input,num_classes) model = Model(img_input, output) # model.load_weights('ckpt.h5') print(model.summary()) plot(model, to_file='cnn_model.png',show_shapes=True)
这个模型的参数情况如下图所示。图像识别的问题就是这点麻烦,参数太多了,大批求导,怪不得天价核弹这么贵还这么有市场:
本质上还是一个分类问题,使用交叉熵作为损失函数,定义输出结果的好坏:
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
设定回馈:
tb_cb = TensorBoard(log_dir='./densenet/', histogram_freq=0) change_lr = LearningRateScheduler(scheduler) ckpt = ModelCheckpoint('./ckpt.h5', save_best_only=False, mode='auto', period=10) cbks = [change_lr,tb_cb,ckpt]
添加上数据集扩充功能,对图像做一些弹性变换,比如水平翻转,垂直翻转,旋转:
print('Using real-time data augmentation.') datagen = ImageDataGenerator(horizontal_flip=True,width_shift_range=0.125,height_shift_range=0.125,fill_mode='constant',cval=0.) datagen.fit(x_train)
训练模型:
model.fit_generator(datagen.flow(x_train, y_train,batch_size=batch_size), steps_per_epoch=iterations, epochs=epochs, callbacks=cbks,validation_data=(x_test, y_test)) model.save('densenet.h5')
训练过程cpu(i7-7820hk)满载:
在cpu上进行一次训练需要将近10000秒:
根据之前手写数字文本识别模型的经验(cpu需要12秒,gtx1080只需要0.47秒,gpu是cpu性能的25.72倍),把本程序的epoch改到2500,则gtx1080需要大概270小时。
在v100天价核弹上会是个什么情况呢?明天去试试看咯!