半监督学习和GAN
介绍
什么是半监督学习?
大多数深度学习分类器需要大量的标签样本才能很好地推广,但获取这些数据是一个昂贵且困难的过程。为了处理这个限制,提出了半监督学习,这是一类利用标记数据的小批量以及大量未标记数据的技术。许多机器学习研究人员发现,将未标记数据与少量标记数据结合使用时,可以显着提高学习准确性。在半监督学习中,GAN已经显示出很大的潜力,其中分类器可以通过很少标记的数据获得良好的性能。
关于GAN的背景
GAN是深度生成模型的成员。它们特别有趣,因为它们没有明确表示数据所在空间的概率分布。相反,他们提供了一些通过从中抽取样本而与这种概率分布直接交互的方式。
GAN的基本想法:
一个生成器g:取随机噪声Ž作为输入并输出图像X。其参数经过调整,可以从鉴别器获得它产生的假图像的高分。
一个鉴别器D:拍摄图像X作为输入,并输出一个得分体现其置信度,这是一个真实图像。它的参数被调整为当它由真实图像馈送时具有高分,并且当从发生器馈送假图像时分数低。
Intuition(直觉)
鉴别器的vanilla架构只有一个输出神经元用于分类R / F概率。我们同时训练两个网络并在训练后丢弃鉴别器,因为它仅用于改进发生器。
对于半监督任务,除了R / F神经元之外,鉴别器现在将具有10个用于MNIST数字分类的神经元。而且,这次他们的角色会改变,我们可以在训练后丢弃生成器,其唯一目标是生成未标记的数据以提高鉴别器的性能。
现在鉴别器变成11级分类器,其中1个神经元(R / F神经元)代表假数据输出,另外10个代表具有类的实际数据。必须牢记以下几点:
为了确定R / F神经元输出标签= 0,当来自数据集的实际无监督数据被馈送时
为了断言R / F神经元输出标签= 1,当来自发电机的假的无监督数据被馈送时
为了断言R / F输出标签= 0和相应的标签输出= 1,当真实监督数据被馈送时
这种不同数据来源的组合将有助于鉴别器的分类更精确,而不仅仅是提供了一部分标记数据。
架构
The Discriminator
遵循的架构与DCGAN 论文中提出的架构类似。我们使用分步卷积来减少特征向量的尺寸,而不是任何合并层,并且为所有层应用一系列leaky_relu,dropout和BN来稳定学习。对于输入层和最后一层,BN被丢弃(用于特征匹配)。最后,我们执行全局平均池以取得特征向量空间维上的平均值。这将张量维度压缩为单个值。在展平特征之后,为了多级输出增加了一个11类的密集层,其中softmax激活。
def discriminator(x, dropout_rate = 0., is_training = True, reuse = False):
# input x -> n+1 classes
with tf.variable_scope('Discriminator', reuse = reuse):
# x = ?*64*64*1
#Layer 1
conv1 = tf.layers.conv2d(x, 128, kernel_size = [4,4], strides = [2,2],
padding = 'same', activation = tf.nn.leaky_relu, name = 'conv1') # ?*32*32*128
#No batch-norm for input layer
dropout1 = tf.nn.dropout(conv1, dropout_rate)
#Layer2
conv2 = tf.layers.conv2d(dropout1, 256, kernel_size = [4,4], strides = [2,2],
padding = 'same', activation = tf.nn.leaky_relu, name = 'conv2') # ?*16*16*256
batch2 = tf.layers.batch_normalization(conv2, training = is_training)
dropout2 = tf.nn.dropout(batch2, dropout_rate)
#Layer3
conv3 = tf.layers.conv2d(dropout2, 512, kernel_size = [4,4], strides = [4,4],
padding = 'same', activation = tf.nn.leaky_relu, name = 'conv3') # ?*4*4*512
batch3 = tf.layers.batch_normalization(conv3, training = is_training)
dropout3 = tf.nn.dropout(batch3, dropout_rate)
# Layer 4
conv4 = tf.layers.conv2d(dropout3, 1024, kernel_size=[3,3], strides=[1,1],
padding='valid',activation = tf.nn.leaky_relu, name='conv4') # ?*2*2*1024
# No batch-norm as this layer's op will be used in feature matching loss
# No dropout as feature matching needs to be definite on logits
# Layer 5
# Note: Applying Global average pooling
flatten = tf.reduce_mean(conv4, axis = [1,2])
logits_D = tf.layers.dense(flatten, (1 + num_classes))
out_D = tf.nn.softmax(logits_D)
return flatten,logits_D,out_D
The Generator
Generator体系结构旨在镜像鉴别器的空间输出。使用分数阶式卷积来增加表示的空间维度。噪声的四维张量的输入被馈送,其经历一系列转置卷积,relu,BN(除了在输出层处)和丢失操作。最后,tanh激活将输出图像映射到范围(-1,1)。
def generator(z, dropout_rate = 0., is_training = True, reuse = False):
# input latent z -> image x
with tf.variable_scope('Generator', reuse = reuse):
#Layer 1
deconv1 = tf.layers.conv2d_transpose(z, 512, kernel_size = [4,4],
strides = [1,1], padding = 'valid',
activation = tf.nn.relu, name = 'deconv1') # ?*4*4*512
batch1 = tf.layers.batch_normalization(deconv1, training = is_training)
dropout1 = tf.nn.dropout(batch1, dropout_rate)
#Layer 2
deconv2 = tf.layers.conv2d_transpose(dropout1, 256, kernel_size = [4,4],
strides = [4,4], padding = 'same',
activation = tf.nn.relu, name = 'deconv2')# ?*16*16*256
batch2 = tf.layers.batch_normalization(deconv2, training = is_training)
dropout2 = tf.nn.dropout(batch2, dropout_rate)
#Layer 3
deconv3 = tf.layers.conv2d_transpose(dropout2, 128, kernel_size = [4,4],
strides = [2,2], padding = 'same',
activation = tf.nn.relu, name = 'deconv3')# ?*32*32*256
batch3 = tf.layers.batch_normalization(deconv3, training = is_training)
dropout3 = tf.nn.dropout(batch3, dropout_rate)
#Output layer
deconv4 = tf.layers.conv2d_transpose(dropout3, 1, kernel_size = [4,4],
strides = [2,2], padding = 'same',
activation = None, name = 'deconv4')# ?*64*64*1
out = tf.nn.tanh(deconv4)
return out
模型的损失
我们首先通过将实际标签附加到零来准备整个批次的扩展标签。这样做是为了在标记数据馈送时将R / F神经元输出置为0。未标记数据的鉴别器损失可以被认为是一个二进制S形丢失,通过将R / F神经元输出声明为假图像为0,对于真实图像为0。
### Discriminator loss ###
# Supervised loss -> which class the real data belongs to
temp = tf.nn.softmax_cross_entropy_with_logits_v2(logits = D_real_logit,
labels = extended_label)
# Labeled_mask and temp are of same size = batch_size where temp is softmax cross_entropy calculated over whole batch
D_L_Supervised = tf.reduce_sum(tf.multiply(temp,labeled_mask)) / tf.reduce_sum(labeled_mask)
# Multiplying temp with labeled_mask gives supervised loss on labeled_mask
# data only, calculating mean by dividing by no of labeled samples
# Unsupervised loss -> R/F
D_L_RealUnsupervised = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits = D_real_logit[:, 0], labels = tf.zeros_like(D_real_logit[:, 0], dtype=tf.float32)))
D_L_FakeUnsupervised = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits = D_fake_logit[:, 0], labels = tf.ones_like(D_fake_logit[:, 0], dtype=tf.float32)))
D_L = D_L_Supervised + D_L_RealUnsupervised + D_L_FakeUnsupervised
发生器损失是假图像丢失的组合,其假想地将R / F神经元输出断言为0并且特征匹配损失惩罚训练数据上的一组特征的平均值与该组的平均值之间的平均绝对误差生成的样本上的特征。
### Generator loss ###
# G_L_1 -> Fake data wanna be real
G_L_1 = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits = D_fake_logit[:, 0],labels = tf.zeros_like(D_fake_logit[:, 0], dtype=tf.float32)))
# G_L_2 -> Feature matching
data_moments = tf.reduce_mean(D_real_features, axis = 0)
sample_moments = tf.reduce_mean(D_fake_features, axis = 0)
G_L_2 = tf.reduce_mean(tf.square(data_moments-sample_moments))
G_L = G_L_1 + G_L_2
训练
训练图像从[batch_size,28,28,1]调整为[batch_size,64,64,1]以适应发生器/鉴别器体系结构。计算损失,精确度和生成的样本,并在每个时期观察到改进。
for epoch in range(epochs):
train_accuracies, train_D_losses, train_G_losses = [], [], []
for it in range(no_of_batches):
batch = mnist_data.train.next_batch(batch_size, shuffle = False)
# batch[0] has shape: batch_size*28*28*1
batch_reshaped = tf.image.resize_images(batch[0], [64, 64]).eval()
# Reshaping the whole batch into batch_size*64*64*1 for disc/gen architecture
batch_z = np.random.normal(0, 1, (batch_size, 1, 1, latent))
mask = get_labeled_mask(labeled_rate, batch_size)
train_feed_dict = {x : scale(batch_reshaped), z : batch_z,
label : batch[1], labeled_mask : mask,
dropout_rate : 0.7, is_training : True}
#The label provided in dict are one hot encoded in 10 classes
D_optimizer.run(feed_dict = train_feed_dict)
G_optimizer.run(feed_dict = train_feed_dict)
train_D_loss = D_L.eval(feed_dict = train_feed_dict)
train_G_loss = G_L.eval(feed_dict = train_feed_dict)
train_accuracy = accuracy.eval(feed_dict = train_feed_dict)
train_D_losses.append(train_D_loss)
train_G_losses.append(train_G_loss)
train_accuracies.append(train_accuracy)
tr_GL = np.mean(train_G_losses)
tr_DL = np.mean(train_D_losses)
tr_acc = np.mean(train_accuracies)
print ('After epoch: '+ str(epoch+1) + ' Generator loss: '
+ str(tr_GL) + ' Discriminator loss: ' + str(tr_DL) + ' Accuracy: ' + str(tr_acc))
gen_samples = fake_data.eval(feed_dict = {z : np.random.normal(0, 1, (25, 1, 1, latent)), dropout_rate : 0.7, is_training : False})
# Dont train batch-norm while plotting => is_training = False
test_images = tf.image.resize_images(gen_samples, [64, 64]).eval()
show_result(test_images, (epoch + 1), show = True, save = False, path = '')
结论
由于GPU访问受限,培训已完成5个时期和20%的标签。为了获得更好的结果,建议使用less_label_rate的更多训练时期。
无监督学习被认为是AGI领域的一个缺陷。为弥补这一差距,GAN被认为是用低标号数据学习复杂任务的潜在解决方案。随着在半监督学习和无监督学习领域崭露头角的新方法,我们可以预期这种差距会缩小。