使用深度神经网络进行风格转换(Python实现)
在论文(Image Style Transfer Using Convolutional Neural Networks)中(https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf),风格转换使用了19层VGG网络中的特征,它由一系列卷积层和池化层以及几个全连接层组成。在下面的图像中,卷积层是根据堆栈及其在堆栈中的顺序命名的。Conv_1_1是图像在第一个堆栈中通过的第一个卷积层。Conv_2_1是第二个堆栈中的第一个卷积层。网络中卷积层最深的是conv_5_4。
VGG19中的卷积层堆栈
分离风格与内容
风格转换依赖于分离图像的内容和风格。给定一个内容图像和一个风格图像,我们的目标是创建一个新的目标图像,该图像应该包含我们想要的内容和风格组件:
- 对象及其排列与内容图像相似
- 样式、颜色和纹理与风格图像相似
下面是一个例子,内容图像是一只猫,风格图像是葛饰北斋的巨浪。所生成的目标图像仍然包含猫,但是用波浪、蓝色和米色的颜色进行了风格化处理。
在本文中,我们将使用一个预训练的VGG19网络从传入的图像中提取内容或风格特征。然后,我们将形成内容和风格损失的概念,并使用这些损失来迭代地更新我们的目标图像,直到得到我们想要的结果。
# import resources %matplotlib inline from PIL import Image import matplotlib.pyplot as plt import numpy as np import torch import torch.optim as optim from torchvision import transforms, models
加载VGG19(特征)
VGG19分为两部分:
- vgg19.features,卷积层和池化层
- vgg19.classifier,最后的三个线性、分类器层
我们只需要特征部分,我们将加载并“冻结”权重。
# get the "features" portion of VGG19 (we will not need the "classifier" portion) vgg = models.vgg19(pretrained=True).features # freeze all VGG parameters since we're only optimizing the target image for param in vgg.parameters(): param.requires_grad_(False) # move the model to GPU, if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
加载内容和风格图像
你可以载入任何你想要的图片!下面,我们提供了一个辅助函数,用于加载任何类型和大小的图像。load_image函数还将图像转换为归一化张量。
def load_image(img_path, max_size=400, shape=None): ''' Load in and transform an image, making sure the image is <= 400 pixels in the x-y dims.''' image = Image.open(img_path).convert('RGB') # large images will slow down processing if max(image.size) > max_size: size = max_size else: size = max(image.size) if shape is not None: size = shape in_transform = transforms.Compose([ transforms.Resize(size), transforms.ToTensor(), transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]) # discard the transparent, alpha channel (that's the :3) and add the batch dimension image = in_transform(image)[:3,:,:].unsqueeze(0) return image
接下来,我将按文件名加载图像,并强制风格图像与内容图像的大小相同。Python代码如下:
# load in content and style image content = load_image('images/octopus.jpg').to(device) # Resize style to match content, makes code easier style = load_image('images/hockney.jpg', shape=content.shape[-2:]).to(device) # helper function for un-normalizing an image # and converting it from a Tensor image to a NumPy image for display def im_convert(tensor): """ Display a tensor as an image. """ image = tensor.to("cpu").clone().detach() image = image.numpy().squeeze() image = image.transpose(1,2,0) image = image * np.array((0.229, 0.224, 0.225)) + np.array((0.485, 0.456, 0.406)) image = image.clip(0, 1) return image # display the images fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10)) # content and style ims side-by-side ax1.imshow(im_convert(content)) ax2.imshow(im_convert(style))
VGG19 Layers
要获得图像的内容和风格表示,我们必须通过VGG19网络向forward through图像,直到到达所需的层,然后从该层获得输出。
print(vgg)
内容和风格特征
TODO:完成将层名称映射到文章中用于内容表示和风格表示的名称。
def get_features(image, model, layers=None): """ Run an image forward through a model and get the features for a set of layers. Default layers are for VGGNet matching Gatys et al (2016) """ ## TODO: Complete mapping layer names of PyTorch's VGGNet to names from the paper ## Need the layers for the content and style representations of an image if layers is None: layers = {'0': 'conv1_1', '5': 'conv2_1', '10': 'conv3_1', '19': 'conv4_1', '28': 'conv5_1', '21': 'conv4_2'} ## -- do not need to change the code below this line -- ## features = {} x = image # model._modules is a dictionary holding each module in the model for name, layer in model._modules.items(): x = layer(x) if name in layers: features[layers[name]] = x return features
Gram矩阵
每个卷积层的输出都是一个张量,张量的维数与batch_size、深度d以及高度(h)和宽度( w)相关,卷积层的Gram矩阵计算如下:
- 使用batch_size,d,h,w = tensort . size,获取张量的深度、高度和宽度
- Reshape张量,使空间维度flattened
- 计算g矩阵的方法是用重塑的张量乘以它的转置
注意:您可以使用torch.mm(matrix1 matrix2)将两个矩阵相乘。
TODO:完成gram_matrix函数
def gram_matrix(tensor): """ Calculate the Gram Matrix of a given tensor Gram Matrix: https://en.wikipedia.org/wiki/Gramian_matrix """ ## get the batch_size, depth, height, and width of the Tensor batch_size, d, h, w = tensor.size() ## reshape it, so we're multiplying the features for each channel tensor = tensor.view(tensor.shape[1], -1) ## calculate the gram matrix gram = torch.mm(tensor, torch.t(tensor)) return gram
把它们放在一起
现在我们已经编写了提取特征和计算给定卷积层的矩阵的函数;让我们把这些放在一起!我们将从图像中提取我们的特征,并在风格表示中计算每个层的gram矩阵。
# get content and style features only once before forming the target image content_features = get_features(content, vgg) style_features = get_features(style, vgg) # calculate the gram matrices for each layer of our style representation style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_features} # create a third "target" image and prep it for change # it is a good idea to start of with the target as a copy of our *content* image # then iteratively change its style target = content.clone().requires_grad_(True).to(device)
损失和权重
单层样式权重
下面,您可以选择在每个相关层对风格表示进行加权。建议您使用0-1之间的范围来对这些层进行加权。通过对前面的层(conv1_1和conv2_1)进行更多的加权,您可以期望在最终的目标图像中得到更大风格特征。如果您选择对后面的层进行加权,那么您将更加强调较小的特征。这是因为每一层都有不同的大小,它们一起创建了多尺度的风格表示!
内容和风格权重
就像在论文中一样,我们定义了一个alpha (content_weight)和一个beta (style_weight)。这个比例会影响最终图像的风格化程度。建议您保留content_weight = 1并设置style_weight以达到您想要的比例。
# weights for each style layer # weighting earlier layers more will result in *larger* style artifacts # notice we are excluding `conv4_2` our content representation style_weights = {'conv1_1': 1., 'conv2_1': 0.8, 'conv3_1': 0.5, 'conv4_1': 0.3, 'conv5_1': 0.1} # you may choose to leave these as is content_weight = 1 # alpha style_weight = 1e6 # beta
更新目标并计算损失
您将决定更新图像的步骤,这类似于您之前看到的训练循环,只是我们正在更改目标图像而不是VGG19或任何其他图像。因此,步骤的数量取决于你的设置!我建议使用至少2000步以获得良好效果。但是,如果您只是测试不同的权重值或尝试不同的图像,您可能希望以较少的步骤开始
在迭代循环中,您将计算内容和风格损失,并相应地更新目标图像。
内容损失
在conv4_2层,内容损失将是目标和内容特征之间的均方差。这可以计算如下:
content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2)
风格损失
风格损失的计算方法与此类似,只是您必须遍历许多层,这些层由字典style_weights中的名称指定。
您将计算目标图像的gram矩阵、target_gram和每个层的样式图像style_gram,并比较这些gram矩阵,计算layer_style_loss。
全部损失
最后,通过将风格和内容损失相加并使用指定的alpha和beta对它们进行加权,您将创建总损失!
如果损失很大,不要惊慌。图像样式的改变需要一些时间,您应该关注目标图像的外观,而不是任何损失。但是,您应该看到随着迭代次数的增加,这种损失会减少。
# for displaying the target image, intermittently show_every = 20 # iteration hyperparameters optimizer = optim.Adam([target], lr=0.003) steps = 100 # decide how many iterations to update your image (5000) for ii in range(1, steps+1): ## TODO: get the features from your target image ## Then calculate the content loss target_features = get_features(target, vgg) content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2'])**2) # the style loss # initialize the style loss to 0 style_loss = 0 # iterate through each style layer and add to the style loss for layer in style_weights: # get the "target" style representation for the layer target_feature = target_features[layer] _, d, h, w = target_feature.shape ## TODO: Calculate the target gram matrix target_gram = gram_matrix(target_feature) ## TODO: get the "style" style representation style_gram = style_grams[layer] ## TODO: Calculate the style loss for one layer, weighted appropriately layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram)**2) # add to the style loss style_loss += layer_style_loss / (d * h * w) ## TODO: calculate the *total* loss total_loss = content_weight * content_loss + style_weight * style_loss ## -- do not need to change code, below -- ## # update your target image optimizer.zero_grad() total_loss.backward() optimizer.step() # display intermediate images and print the loss if ii % show_every == 0: print('Total loss: ', total_loss.item()) plt.imshow(im_convert(target)) plt.show()
显示目标图像
# display content and final, target image fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10)) ax1.imshow(im_convert(content)) ax2.imshow(im_convert(target))