### Non-negativity of the KL Divergence

It is straightforward to show that the KL divergence is never negative using Jensen’s inequality and the concavity of the $\log$ function.

Jensen implies that $\mathbb{E}[f(x)] \geq f(\mathbb{E}[x])$ when $f(x)$ is convex.

Setting $f(x)=-\log(x)$ gives

\begin{aligned} D_{KL}(p\Vert q) & = \mathbb{E}_q\left[-\log \frac{p(x)}{q(x)}\right] \\ & \geq -\log \mathbb{E}_q\left[\frac{p(x)}{q(x)}\right] \\ & = -\log \int q(x) \frac{p(x)}{q(x)} dx \\ & = -\log \int p(x) dx = -\log 1 = 0 \\ \end{aligned}

Furthermore, the KL divergence is just one member of a more general family, the Csiszár $f$-divergences. These have the form

$D_{f}(p\Vert q) = \int q(x) f\left(\frac{p(x)}{q(x)}\right) dx$

for some convex function $f$. The same argument applies here (noting that the lower bound is now $f(1)$, so this will only translate into non-negativity for particular choices of $f$).