It is straightforward to show that the KL divergence is never negative using Jensen’s inequality and the concavity of the function.
Jensen implies that when is convex.
Furthermore, the KL divergence is just one member of a more general family, the Csiszár -divergences. These have the form
for some convex function . The same argument applies here (noting that the lower bound is now , so this will only translate into non-negativity for particular choices of ).