BP神经网络算法推导

[TOC]

前置知识

梯度下降法

\[设损失函数为F(\vec{w})\\则F(\vec{w}+\Delta{\vec{w}})-F(\vec{w}) = \nabla{F(\vec{w})} \cdot \Delta{\vec{w}}\\其中\nabla{F(\vec{w})} 是 F(\vec{w})的梯度\\所以当\Delta{\vec{w}} = -\eta \nabla F(\vec{w}),(\eta>0)时,下降速率最快\\即\Delta{w_i} = -\eta \frac{\partial{F}}{\partial{w_i}}\]

激活函数

\[设当前激活函数为f(x) = 1/(1+\exp(-x))\\有f'(x) = \exp(-x)/(1+\exp(-x))^2=f(x)*(1-f(x))\]

多元复合函数求偏导的相关知识

正向计算

符号定义

  1. 节点 \(i\)的输入为\(net_i\),输出为\(O_i\)
  2. \(v_{ij},w_{jk}是节点i到节点j,节点j到节点k的权值\)
  3. 误差为\(E\),是个多元函数

输入层

输入层不使用激活函数,\(O_i = x_i\)

隐含层

隐含层输入为 \(net_j = \Sigma_{i=0}^{I-1} v_{ij}*O_i\)
输出为 \(O_j = f(net_j)\)

输出层

输入为\(net_k = \Sigma_{j=0}^{J-1}w_{jk}*O_j\)
输出为 \(O_k = f(net_k)\)

误差函数

\(E=\frac{1}{2}\Sigma_{k=0}^{K-1}(d_k-O_k)^2, d_k为期望输出\)

反向传播

使用梯度下降法调整节点间连接的权值,使得E获得极小值

输出层与隐含层之间的权值调整

\[把E视为是关于\vec{w}, \vec{v}的函数\\由前置知识得,\Delta{w_{jk}} = -\eta\frac{\partial E }{\partial w_{jk} }=-\eta\frac{\partial E}{\partial net_k}\cdot \frac{\partial net_k}{\partial w_{jk}}\\即 \Delta{w_{jk}} = \eta(-\frac{\partial{E}}{\partial{net_k}})O_j\\设\delta_k = -\frac{\partial{E}}{\partial{net_k}} = -\frac{\partial{E}}{\partial{O_k}} \cdot \frac{dO_k}{dnet_k}\\即 \delta_k = (d_k-O_k)f'(net_k), \Delta{w_{jk}} = \eta\delta_kO_j\]

隐含层与输入层之间权值的调整

\[同理\Delta{v_{ij}} = -\eta\frac{\partial E }{\partial v_{ij} }=\eta(-\frac{\partial E}{\partial net_j}) \frac{\partial net_j}{\partial w_{ij}}\\即 \Delta{v_{ij}} = \eta(-\frac{\partial{E}}{\partial{net_j}})O_i\\设\delta_j = -\frac{\partial{E}}{\partial{net_j}} = -\frac{\partial{E}}{\partial{O_j}} \cdot \frac{dO_j}{dnet_j}=-\frac{\partial E}{\partial net_j}f'(net_j)\\-\frac{\partial E }{\partial O_j} = -\Sigma_{k=0}^{K-1}\frac{\partial E}{\partial net_k}\frac{\partial net_k}{\partial O_j}=\Sigma_{k=0}^{K-1}\delta_kw_{jk}\\所以\Delta v_{ij} = \eta O_if'(net_j)\Sigma_{k=0}^{K-1}\delta_kw_{jk}\]

计算步骤

  1. 假设经过正向计算得到\(O_i, O_j,O_k\),且已知\(v_{ij},w_{jk},d_k\)

  2. 计算\(f'(net_k), f'(net_j)\),对于我选用的激活函数来说,
    \[ f'(net_k)=O_k(1-O_k)\\ f'(net_j)=O_j(1-O_j) \]
    有可能选择其他激活函数,所以把此步骤单独分开

  3. 计算\(\delta_k=(d_k-O_k)f'(net_k)\)

  4. 计算\(\Delta w_{jk}=\eta\delta_kO_j\)

  5. 计算\(\Delta v_{ij}=\eta O_i f'(net_j)\Sigma_{i=0}^{K-1}\delta_k w_{jk}\)

  6. \(v_{ij}+=\Delta v_{ij}, w_{jk} += \Delta w_{jk}\)

相关推荐