KL
散度公式
D(p∥q)=∑x∈Xp(x)logp(x)q(x)=E[logp(X)q(X))]\begin{aligned}D(p\|q) &=\sum_{x \in X} p(x) \log \frac{p(x)}{q(x)} \\&\left.=E\left[\log \frac{p(X)}{q(X)}\right)\right]\end{aligned}D(p∥q)=x∈X∑p(x)logq(x)p(x)=E[logq(X)p(X))]
互信息
公式
I(X,Y)=H(X)−H(X∣Y)=∑x,yp(x,y)logp(y∣x)p(y))\begin{aligned} I(X, Y) &=H(X)-H(X \mid Y) \\ &=\sum_{x, y} p(x, y) \log \frac{p(y \mid x)}{p(y))} \end{aligned}I(X,Y)=H(X)−H(X∣Y)=x,y∑p(x,y)logp(y))p(y∣x)
KL
与互信息
I(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=D(p(x,y)∥p(x)p(y))=E[logp(x,y)p(x)p(y)]\begin{aligned} I(X, Y) &=H(X)-H(X \mid Y) \\ &=H(Y)-H(Y \mid X) \\ &=D(p(x, y) \| p(x) p(y)) \\ &=E\left[\log \frac{p(x, y)}{p(x) p(y)}\right] \end{aligned}I(X,Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=D(p(x,y)∥p(x)p(y))=E[logp(x)p(y)p(x,y)]
Jensen-Shannon Divergence
公式
JSD(P∥Q)=12D(P∥M)+12D(Q∥M)whereM=12(P+Q)\begin{aligned} &\quad \operatorname{JSD}(P \| Q)=\frac{1}{2} D(P \| M)+\frac{1}{2} D(Q \| M) \\ &\text { where } M=\frac{1}{2}(P+Q) \end{aligned}JSD(P∥Q)=21D(P∥M)+21D(Q∥M)whereM=21(P+Q)
tensorflow KL损失函数
import numpy as npimport pandas as pdimport tensorflow as tfy_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float64)y_pred = np.random.random(size=(2, 3))loss = tf.keras.losses.kullback_leibler_divergence(y_true, y_pred)assert loss.shape == (2,)y_true = tf.keras.backend.clip(y_true, 1e-7, 1)y_pred = tf.keras.backend.clip(y_pred, 1e-7, 1)assert np.array_equal(loss.numpy(), np.sum(y_true * np.log(y_true / y_pred), axis=-1))print(y_true * np.log(y_true / y_pred))
参考资料
【1】fabian1heinrich/LearningCommunicationChannelsWithAutoencoders ()
【2】Jensen–Shannon divergence - Wikipedia
【3】tf.keras.metrics.kl_divergence | TensorFlow Core v2.7.0