Cauchy-Schwarz for outer products as a matrix inequality

If you read the Wikipedia page on the Cramér-Rao bound in statistics, there is an elegant and concise proof given of the scalar version of the bound. However, no proof of the full multivariate case is given there.

Indeed, it seems at first like the same approach will not work, because multivariate Cramér-Rao is a matrix inequality, while the scalar proof relies on the Cauchy-Schwarz inequality, which is a statement about inner products. Since an inner product is just a real-valued number, surely a different approach is required for proofs about matrices?

But after reading this 1980 paper of Bultheel I think the same short proof goes through, if we generalise the definition of “inner product” slightly. In fact, this form of Cauchy-Schwarz holds for the familiar outer product and the inner product version is just a special case!

Below we’ll confine ourselves to the reals for simplicity, unlike Bultheel who works more abstractly.

We’ll review the scalar case, then extend it to matrices.

Inner product definition

An inner product, \langle\cdot, \cdot\rangle on a vector space V over the reals takes two vectors and returns a real number. The prototypical example is the dot product on \mathbb{R}^N, \langle \mathbf{x}, \mathbf{y}\rangle = \sum_i x_i y_i, but we can allow others if they satisfy these requirements:

  • Symmetry: \langle \mathbf{x},\mathbf{y} \rangle = \langle \mathbf{y},\mathbf{x} \rangle
  • Bilinearity: \langle a\mathbf{x}+b\mathbf{y}, c\mathbf{z} \rangle = ac \langle \mathbf{x}, \mathbf{z} \rangle + bc \langle \mathbf{y}, \mathbf{z} \rangle
  • Positive definiteness: \langle \mathbf{x},\mathbf{x} \rangle \geq 0, with equality if and only if \mathbf{x}=0.

Cauchy-Schwarz from inner products

The Cauchy-Schwarz inequality is the following statement about products of inner products:

\langle \mathbf{x},\mathbf{x} \rangle\langle \mathbf{y},\mathbf{y} \rangle \geq \langle \mathbf{x},\mathbf{y} \rangle^2

We can show this using the definition of the inner product above. Take a vector \mathbf{x} + c\mathbf{y} where c is a real scalar.

Positive definiteness says:

\langle \mathbf{x} + c\mathbf{y}, \mathbf{x} + c\mathbf{y} \rangle \geq 0

We can use bilinearity to expand this:

\langle \mathbf{x}, \mathbf{x} \rangle + c\langle \mathbf{x}, \mathbf{y} \rangle + c\langle \mathbf{y}, \mathbf{x} \rangle + c^2 \langle \mathbf{y}, \mathbf{y} \rangle \geq 0

and symmetry to obtain

\langle \mathbf{x}, \mathbf{x} \rangle + 2c\langle \mathbf{x}, \mathbf{y} \rangle + c^2 \langle \mathbf{y}, \mathbf{y} \rangle \geq 0

Now if we make the choice c = -\frac{\langle \mathbf{x},\mathbf{y}\rangle}{\langle \mathbf{y},\mathbf{y} \rangle} and simplify:

\langle \mathbf{x}, \mathbf{x} \rangle - 2\frac{\langle \mathbf{x},\mathbf{y}\rangle}{\langle \mathbf{y},\mathbf{y} \rangle}\langle \mathbf{x}, \mathbf{y} \rangle + \frac{\langle \mathbf{x},\mathbf{y}\rangle^2}{\langle \mathbf{y},\mathbf{y} \rangle^2} \langle \mathbf{y}, \mathbf{y} \rangle \geq 0

We obtain the desired inequality:

\langle \mathbf{x}, \mathbf{x} \rangle \geq \frac{\langle \mathbf{x},\mathbf{y}\rangle}{\langle \mathbf{y},\mathbf{y} \rangle}\langle \mathbf{x}, \mathbf{y} \rangle

Matrix-valued inner product axioms

Now we’d like something a little more powerful. We can get this if we are willing to generalise the notion of an inner product to something that returns a matrix instead of a number. I’ll denote this new “inner product” by \langle \cdot, \cdot\rangle_M.

We also need to generalise our axioms slightly for this wider definition. I will follow Bultheel’s definition, simplifying by considering only real-valued matrices. So:

  • Symmetry holds up to a transpose. Now \langle \mathbf{x}, \mathbf{y}\rangle_M is a matrix, we need to add matrix transposition if we swap the arguments, but we still have:

\langle \mathbf{x}, \mathbf{y}\rangle_M = \langle \mathbf{y}, \mathbf{x}\rangle_M^T

  • Bilinearity still applies, not only with scalar coefficients but also with d \times d matrices. We have to be careful about whether we are multiplying on the left or right, because matrix multiplication is not commutative. So we have:

\langle \mathbf{A}\mathbf{x}, \mathbf{B}\mathbf{y} + \mathbf{C}\mathbf{z}\rangle_M = \mathbf{A}\langle \mathbf{x}, \mathbf{y}\rangle_M \mathbf{B}^T + \mathbf{A}\langle \mathbf{x}, \mathbf{z}\rangle_M \mathbf{C}^T

  • Positive definiteness: we will demand that \langle \mathbf{x}, \mathbf{x} \rangle_M is itself positive definite, i.e. \langle \mathbf{x}, \mathbf{x} \rangle_M \succeq 0 as a matrix inequality. We can also insist that \mathrm{Tr}(\langle \mathbf{x}, \mathbf{x} \rangle_M) = 0 implies \mathbf{x} = \mathbf{0}.

Matrix-valued Cauchy-Schwarz

Now a multivariate Cauchy-Schwarz follows from these axioms just as it did in the scalar case, though again we must take care of the transpositions.

Positive definiteness:

\langle \mathbf{x} + \mathbf{A}\mathbf{y}, \mathbf{x} + \mathbf{A}\mathbf{y} \rangle \succeq 0

Bilinearity:

\langle \mathbf{x}, \mathbf{x} \rangle + \langle \mathbf{x}, \mathbf{y} \rangle \mathbf{A}^T + \mathbf{A}\langle \mathbf{y}, \mathbf{x} \rangle + \mathbf{A} \langle \mathbf{y}, \mathbf{y} \rangle \mathbf{A}^T \succeq 0

We substitute \mathbf{A} = -\langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1}:

\langle \mathbf{x}, \mathbf{x} \rangle - \langle \mathbf{x}, \mathbf{y} \rangle (\langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1})^T - \langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y}, \mathbf{x} \rangle + \langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y}, \mathbf{y} \rangle (\langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1})^T \succeq 0

Using transpose symmetry to tidy up:

\langle \mathbf{x}, \mathbf{x} \rangle - \langle \mathbf{x}, \mathbf{y} \rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y},\mathbf{x}\rangle - \langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y}, \mathbf{x} \rangle + \langle \mathbf{x},\mathbf{y}\rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y},\mathbf{x}\rangle \succeq 0

we obtain the matrix form of Cauchy-Schwarz:

\langle \mathbf{x}, \mathbf{x} \rangle \succeq \langle \mathbf{x}, \mathbf{y} \rangle \langle \mathbf{y},\mathbf{y} \rangle^{-1} \langle \mathbf{y},\mathbf{x}\rangle

In particular, in the d=1 scalar case, this reduces to the usual scalar form of the inequality.

Matrix-valued metrics?

I think this is cute! For one thing, we’ve just defined the outer product to be an inner product!

(The outer product between two N- dimensional vectors \mathbf{u}, \mathbf{v} is the N \times N matrix \mathbf{uv^T}, while the Euclidean dot product is the scalar \mathbf{u^T v}.)

Yet since the outer product is transpose symmetric, bilinear, and results in a positive definite matrix for a single vector, it’s a perfectly good inner product for these purposes. I wonder why Cauchy-Schwarz is more commonly known in the less general inner product form?

I’m also intrigued by the geometric connotations of “matrix-valued inner products”. The inner product is an algebraic construction which is geometrically motivated, and so bridges these two aspects of mathematics. The inner product is at the core of geometry and defines:

  • length of vectors (from the induced norm \Vert \mathbf{x} \Vert = \sqrt{\langle \mathbf{x},\mathbf{x} \rangle})
  • angles between vectors (from \cos(\angle \mathbf{x}\mathbf{y}) = \frac{\langle \mathbf{x},\mathbf{y} \rangle}{\Vert \mathbf{x}\Vert\Vert \mathbf{y}\Vert}), and in particular orthogonality when \langle \mathbf{x},\mathbf{y} \rangle = 0
  • projections onto sets (by minimizing the norm, or by orthogonality)

So – what would it mean geometrically for a length or an angle to be matrix valued?

I don’t know! But it does occur to me that if you have two ordinary, independent scalar metrics g_1 and g_2, you can always compose these into a new “matrix-valued metric” \mathbf{g} = \mathrm{diag}(g_1, g_2). (This is still positive definite in the sense above). That declares two vectors to be orthogonal when both of the component metrics are: this means that the trace of our matrix, which is the sum of its eigenvalues, will be zero. (If we had used the determinant instead, it would declare orthogonality whenever any of the constituents did.)

Furthermore, the trace of the outer product recovers the inner product. In fact, the trace already gives a proper inner product between square matrices, thought of as a vector space: \langle \mathbf{A}, \mathbf{B}\rangle = \mathrm{tr}(\mathbf{A}\mathbf{B}). So we can squash our matrix-valued inner product back to an ordinary scalar inner product by taking the trace. And if we do this for our diagonal matrix of independent metrics, we recover the usual metric on \mathbb{R}^N! It was there all along, but the matrix-valued metric additionally preserves more information about along which basis directions the vectors agree and disagree.