Keras实施MnasNet
让我们探索一种名为MnasNet的移动平台感知神经网络架构搜索算法 。MnasNet由Google Brain团队开发。在这里,我们将回顾论文的主要贡献,应用的方法,最后,快速实现Keras中的最终模型。在此之前,让我们了解这种模型背后的动机。
动机
为大型数据集设计,训练和评估卷积神经网络是一项艰巨的任务,因为它耗时且需要广泛的领域知识。为了解决设计卷积神经网络(CNN)模型的问题,Google Brain团队设计了一个名为NasNet(神经架构搜索网络)的模型,该模型搜索可能的卷积,池化以及具有可变步幅,核大小等的块的搜索空间。但是,该模型没有可以在移动平台上运行搜索的高效模型。因此,开发了MnasNet。
主要贡献
作者在评估模型时引入延迟信息,以阻止昂贵操作的大型模型。这导致准确性和延迟之间的良好折衷。
在ImageNet分类任务中,MnasNet模型在Pixel手机上实现了74.0%的前1精度,延迟为76ms。
在COCO对象检测任务中,MnasNet实现了比MobileNets更高的mAP质量和更低的延迟。
分解的分层搜索空间
作者使用了分解的分层搜索空间。这意味着他们将层预打包成块,然后使用可变超参数搜索这些块。可以在原始论文中找到对此的可视化。
模型架构
让我们直接进入使用他们的方法找到的模型架构。架构如下:
图1:MnasNet架构 - (a)是主要模型; (b) - (f)是相应的块
除了一个之外的每个块具有相同的结构。结构如下:
Conv2D(1x1) - > BatchNormalization - > ReLU6 - > DepthwiseConv2D - > BatchNormalization - > ReLU6 - > Conv2D(1x1) - > BatchNormalization - > ReLU
根据结构的不同,块可能会从最后一层的输入跳转到输出,也可能不会。SepConv层只有DepthwiseConv2D,Conv2D(1x1),BatchNormalization,最后是ReLU6激活层。
我们首先定义最初的Conv3x3块,Python代码如下:
def _conv_block(inputs, strides, filters, kernel=3):
"""
Adds an initial convolution layer (with batch normalization and relu6).
"""
x = layers.Conv2D(filters, kernel, padding='same', use_bias=False, strides=strides, name='Conv1')(inputs)
x = layers.BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv1_bn')(x)
print(x.name, inputs.shape, x.shape)
return layers.ReLU(6., name='Conv1_relu')(x)
然后,我们可以将SepConv3x3块定义为:
def _sep_conv_block(inputs, filters, alpha, pointwise_conv_filters, depth_multiplier=1, strides=(1, 1)):
"""
Adds a separable convolution block.
A separable convolution block consists of a depthwise conv,
and a pointwise convolution.
"""
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
x = layers.DepthwiseConv2D((3, 3),
padding='same',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False,
name='Dw_conv_sep')(inputs)
x = layers.Conv2D(pointwise_conv_filters, (1, 1), padding='valid', use_bias=False,
strides=strides, name='Conv_sep')(x)
x = layers.BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_sep_bn')(x)
print(x.name, inputs.shape, x.shape)
return layers.ReLU(6., name='Conv_sep_relu')(x)
最后,我们定义了深度卷积块,也称为反向残差块(来自MobileNet文献),Python代码如下:
def _inverted_res_block(inputs, kernel, expansion, alpha, filters, block_id, stride=1):
in_channels = inputs._keras_shape[-1]
pointwise_conv_filters = int(filters * alpha)
pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
x = inputs
prefix = 'block_{}_'.format(block_id)
if block_id:
x = layers.Conv2D(expansion * in_channels,
kernel_size=1,
padding='same',
use_bias=False,
activation=None,
name=prefix + 'expand')(x)
x = layers.BatchNormalization(epsilon=1e-3,
momentum=0.999,
name=prefix + 'expand_bn')(x)
x = layers.ReLU(6., name=prefix + 'expand_relu')(x)
else:
prefix = 'expanded_conv_'
x = layers.DepthwiseConv2D(kernel_size=kernel,
strides=stride,
activation=None,
use_bias=False,
padding='same',
name=prefix + 'depthwise')(x)
x = layers.BatchNormalization(epsilon=1e-3,
momentum=0.999,
name=prefix + 'depthwise_bn')(x)
x = layers.ReLU(6., name=prefix + 'depthwise_relu')(x)
x = layers.Conv2D(pointwise_filters,
kernel_size=1,
padding='same',
use_bias=False,
activation=None,
name=prefix + 'project')(x)
x = layers.BatchNormalization(
epsilon=1e-3, momentum=0.999, name=prefix + 'project_bn')(x)
print(x.name, inputs.shape, x.shape)
if in_channels == pointwise_filters and stride == 1:
print("Adding %s" % x.name)
return layers.Add(name=prefix + 'add')([inputs, x])
return x
现在,我们可以通过指定核大小,步幅大小,过滤器数量,skip_id与否等来构建基于论文规范的架构:
def MnasNet(input_shape=None, alpha=1.0, depth_multiplier=1, pooling=None, nb_classes=10):
img_input = layers.Input(shape=input_shape)
first_block_filters = _make_divisible(32 * alpha, 8)
x = _conv_block(img_input, strides=2, filters=first_block_filters)
x = _sep_conv_block(x, filters=16, alpha=alpha,
pointwise_conv_filters=16, depth_multiplier=depth_multiplier)
x = _inverted_res_block(x, kernel=3, expansion=3, stride=2, alpha=alpha, filters=24, block_id=1)
x = _inverted_res_block(x, kernel=3, expansion=3, stride=1, alpha=alpha, filters=24, block_id=2)
x = _inverted_res_block(x, kernel=3, expansion=3, stride=1, alpha=alpha, filters=24, block_id=3)
x = _inverted_res_block(x, kernel=5, expansion=3, stride=2, alpha=alpha, filters=40, block_id=4)
x = _inverted_res_block(x, kernel=5, expansion=3, stride=1, alpha=alpha, filters=40, block_id=5)
x = _inverted_res_block(x, kernel=5, expansion=3, stride=1, alpha=alpha, filters=40, block_id=6)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=2, alpha=alpha, filters=80, block_id=7)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=1, alpha=alpha, filters=80, block_id=8)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=1, alpha=alpha, filters=80, block_id=9)
x = _inverted_res_block(x, kernel=3, expansion=6, stride=1, alpha=alpha, filters=96, block_id=10)
x = _inverted_res_block(x, kernel=3, expansion=6, stride=1, alpha=alpha, filters=96, block_id=11)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=2, alpha=alpha, filters=192, block_id=12)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=1, alpha=alpha, filters=192, block_id=13)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=1, alpha=alpha, filters=192, block_id=14)
x = _inverted_res_block(x, kernel=5, expansion=6, stride=1, alpha=alpha, filters=192, block_id=15)
x = _inverted_res_block(x, kernel=3, expansion=6, stride=1, alpha=alpha, filters=320, block_id=16)
if pooling == 'avg':
x = layers.GlobalAveragePooling2D()(x)
else:
x = layers.GlobalMaxPooling2D()(x)
x = layers.Dense(nb_classes, activation='softmax', use_bias=True, name='proba')(x)
inputs = img_input
model = models.Model(inputs, x, name='mnasnet')
return model
训练和验证
我们可以在CIFAR 10数据集上试用我们的模型,而不是完整的ImageNet数据集:
input_shape = (32, 32)
batch_size = 2048
nb_classes = 10
epochs = 100
model = MnasNet(input_shape=input_shape+(3,), pooling='avg', nb_classes=nb_classes)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
y_train = utils.to_categorical(y_train, nb_classes)
y_test = utils.to_categorical(y_test, nb_classes)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
结论
总之,该方法是一种在没有太多领域知识的情况下设计卷积神经网络(CNN)的简单方法。结合Pixel手机的实时延迟信息也有助于优化架构的延迟和准确性。