使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

WisdomXLH

2019-06-26

作者：chen_h
微信号 & QQ：862251340
微信公众号：coderpai
简书地址：https://www.jianshu.com/p/d6a...

当我们要使用神经网络来构建一个多分类模型时，我们一般都会采用 softmax 函数来作为最后的分类函数。softmax 函数对每一个分类结果都会分配一个概率，我们把比较高的那个概率对应的类别作为模型的输出。这就是为什么我们能从模型中推导出具体分类结果。为了训练模型，我们使用 softmax 函数进行反向传播，进行训练。我们最后输出的就是一个 0-1 向量。

在这篇文章中，我们不会去解释什么是 softmax 回归或者什么是 CNN。这篇文章的主要工作是如何在 TensorFlow 上面设计一个 L2 约束的 softmax 函数，我们使用的数据集是 MNIST。完整的理论分析可以查看这篇论文。

在具体实现之前，我们先来弄清楚一些概念。

softmax 损失函数

softmax 损失函数可以定义如下：

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

其中各个参数定义如下：

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

L2 约束的 softmax 损失函数

带约束的损失函数定义几乎和之前的一样，我们的目的还是最小化这个损失函数。

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

但是，我们需要对 f(x) 函数进行修改。

我们不是直接计算最后层权重与前一层网络输出 f(x) 之间的乘积，而是对前一层的 f(x) 先做一次归一化，然后对这个归一化的值进行 α 倍数的放大，最后我们进行常规的 softmax 函数进行计算。

也就是说，损失函数是受到如下约束：

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

程序细节

所以，我们的架构看起来是如下图（这也是我想要实现的架构图）：

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

C 表示卷积层，P 表示池化层，FC 表示全连接层，L2-Norm 层和Scale 层是我们重点要实现的层。

利用 TensorFlow 进行实现

为了实现这个模型，我们使用这个代码库进行学习。

在应用 dropout 之前，我们先对 N-1 层的输出进行正则化，然后把正则化之后的结果乘以参数 alpha，然后进行 softmax 函数计算。下面是具体的代码展示：

fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))

如果我们把 alpha 设置为 0，那么这就是常规的 softmax 函数，否则就是一个 L2 约束。

完整代码如下：

# Actual Code : https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/convolutional_network.ipynb
# Modified By: Manash

from __future__ import division, print_function, absolute_import

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=False)

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

# Training Parameters
learning_rate = 0.001
num_steps = 100
batch_size = 20


# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units


# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training, alpha=5):
    
    # Define a scope for reusing the variables
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        # Convolution Layer with 32 filters and a kernel size of 5
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Fully connected layer (in tf contrib folder for now)
        fc1 = tf.layers.dense(fc1, 1024)
        
        # If alpha is not zero then perform the l2-Normalization then scaling up
        if alpha != 0:
            fc1 = alpha * tf.divide(fc1, tf.norm(fc1, ord='euclidean'))
    
        # Apply Dropout (if is_training is False, dropout is not applied)
        fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
  
        # Output layer, class prediction
        out = tf.layers.dense(fc1, n_classes)

    return out
  
# Define the model function (following TF Estimator Template)
def model_fn(features, labels, mode):
    # Set alpha
    alph = 50
    
    # Build the neural network
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_train = conv_net(features, num_classes, dropout, reuse=False, is_training=True, alpha=alph)
    
    # At test time we don't need to normalize or scale, it's redundant as per paper : https://arxiv.org/abs/1703.09507
    logits_test = conv_net(features, num_classes, dropout, reuse=True, is_training=False, alpha=0)
    
    # Predictions
    pred_classes = tf.argmax(logits_test, axis=1)
    pred_probas = tf.nn.softmax(logits_test)
    
    # If prediction mode, early return
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes) 
        
    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op, global_step=tf.train.get_global_step())
    
    # Evaluate the accuracy of the model
    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)
    
    # TF Estimators requires to return a EstimatorSpec, that specify
    # the different ops for training, evaluating, ...
    estim_specs = tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=pred_classes,
      loss=loss_op,
      train_op=train_op,
      eval_metric_ops={'accuracy': acc_op})

    return estim_specs

  
# Build the Estimator
model = tf.estimator.Estimator(model_fn)

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.train.images}, y=mnist.train.labels,
    batch_size=batch_size, num_epochs=None, shuffle=False)
# Train the Model
model.train(input_fn, steps=num_steps)

# Evaluate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
model.evaluate(input_fn)


# Predict single images
n_images = 4
# Get images from test set
test_images = mnist.test.images[:n_images]
# Prepare the input data
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': test_images}, shuffle=False)
# Use the model to predict the images class
preds = list(model.predict(input_fn))

# Display
for i in range(n_images):
    plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')
    plt.show()
    print("Model prediction:", preds[i])

性能评估

这个真的能提高性能吗？是的，而且效果非常好，它能提高大约 1% 的性能。我没有计算很多的迭代，主要是我没有很好的电脑。如果你对这个性能有你疑惑，你可以自己试试看。

以下是不同 alpha 值对应的模型性能：

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

橘黄色的线表示用常规的 softmax 函数，蓝色的线是用 L2 约束的 softmax 函数。

算法社区直播课：请点击这里

作者：chen_h
微信号 & QQ：862251340
简书地址：https://www.jianshu.com/p/d6a...

CoderPai 是一个专注于算法实战的平台，从基础的算法到人工智能算法都有设计。如果你对算法实战感兴趣，请快快关注我们吧。加入AI实战微信群，AI实战QQ群，ACM算法微信群，ACM算法QQ群。长按或者扫描如下二维码，关注 “CoderPai” 微信号（coderpai）
使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

softmax tensorflow 损失函数卷积神经网络神经网络模型卷积机器学习

安科网

使用 TensorFlow 在卷积神经网络上实现 L2 约束的 softmax 损失函数

WisdomXLH

softmax 损失函数

L2 约束的 softmax 损失函数

程序细节

利用 TensorFlow 进行实现

性能评估

算法社区直播课：请点击这里

WisdomXLH

相关推荐

keras基于多层感知器的softmax多分类

机器学习 task2 softmax与分类模型

【机器学习】2. Softmax分类器

Softmax分类函数

TensorFlow学习笔记（4）：基于MNIST数据的softmax regression

函数分类大PK：Sigmoid和Softmax，分别怎么用？

深度学习算法原理——Softmax Regression

基于机器学习softmax回归训练多分类的深度神经网络结构模型

Softmax 函数、神经网络输出作为概率和集合分类器 Python实现

TensorFlow实现Softmax回归模型

Haskell手撸Softmax回归实现MNIST手写识别

WisdomXLH