Monday, January 28, 2013

Inverse Projection as Constrained Optimization

Inverse Projection

In our previous discussion, we developed a powerful intuitive sense for projections and their relationships to minimum mean squared error problems. Here, we continue to extract yet another powerful result from the same concept. Recall that we learned to minimize

$$ J = || \mathbf{y} - \mathbf{V}\boldsymbol{\alpha} ||^2 $$

by projecting $\mathbf{y}$ onto the space characterized by $\mathbf{V}$ as follows:

$$ \mathbf{\hat{y}} = \mathbf{P}_V \mathbf{y}$$

where

$$ \mathbf{P}_V = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T$$

where the corresponding error ($\boldsymbol{\epsilon}$) comes directly from the Pythagorean theorem:

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + ||\boldsymbol{\epsilon}||^2 $$

Now, let's consider the inverse problem: given $\mathbf{\hat{y}}$, what is the corresponding $\mathbf{y}$? The first impulse in this situation is to re-visit

$$ \mathbf{\hat{y}} = \mathbf{P}_V \mathbf{y}$$

and see if you can somehow compute the inverse of $\mathbf{P}_V$. This will not work because the projection matrix does not possess a unique inverse. In the following figure, the vertical sheet represents all vectors in the space that have exactly the same projection, $\mathbf{\hat{y}}$. Thus, there is no unique solution to the inverse problem which is to say that it is "ill-posed".

In [29]:
#http://stackoverflow.com/questions/10374930/matplotlib-annotating-a-3d-scatter-plot

from mpl_toolkits.mplot3d import proj3d
import matplotlib.patches as patches
import mpl_toolkits.mplot3d.art3d as art3d
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
fig.set_size_inches([8,8])

ax = fig.add_subplot(111, projection='3d')

ax.set_aspect(1)
ax.set_xlim([0,2])
ax.set_ylim([0,2])
ax.set_zlim([0,2])
ax.set_aspect(1)
ax.set_xlabel('x-axis',fontsize=16)
ax.set_ylabel('y-axis',fontsize=16)
ax.set_zlabel('z-axis',fontsize=16)

y = matrix([1,1,1]).T 
V = matrix([[1,0.25], # columns are v_1, v_2
            [0,0.50],
            [0,0.00]])

alpha=inv(V.T*V)*V.T*y # optimal coefficients
P = V*inv(V.T*V)*V.T
yhat = P*y         # approximant


u = np.linspace(0, 2*np.pi, 100)
v = np.linspace(0, np.pi, 100)

xx = np.outer(np.cos(u), np.sin(v))
yy = np.outer(np.sin(u), np.sin(v))
zz = np.outer(np.ones(np.size(u)), np.cos(v))

sphere=ax.plot_surface(xx+y[0,0], yy+y[1,0], zz+y[2,0],  
                       rstride=4, cstride=4, color='gray',alpha=0.3,lw=0.25)

ax.plot3D([y[0,0],0],[y[1,0],0],[y[2,0],0],'r-',lw=3)
ax.plot3D([y[0,0]],[y[1,0]],[y[2,0]],'ro')

ax.plot3D([V[0,0],0],[V[1,0],0],[V[2,0],0],'b-',lw=3)
ax.plot3D([V[0,0]],[V[1,0]],[V[2,0]],'bo')
ax.plot3D([V[0,1],0],[V[1,1],0],[V[2,1],0],'b-',lw=3)
ax.plot3D([V[0,1]],[V[1,1]],[V[2,1]],'bo')

ax.plot3D([yhat[0,0],0],[yhat[1,0],0],[yhat[2,0],0],'g--',lw=3)
ax.plot3D([yhat[0,0]],[yhat[1,0]],[yhat[2,0]],'go')


x2, y2, _ = proj3d.proj_transform(y[0,0],y[1,0],y[2,0], ax.get_proj())
ax.annotate(
    "$\mathbf{y}$", 
    xy = (x2, y2), xytext = (40, 20), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(yhat[0,0],yhat[1,0],yhat[2,0], ax.get_proj())
ax.annotate(
    "$\hat{\mathbf{y}}$", 
    xy = (x2, y2), xytext = (40, 10), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(V[0,0],V[1,0],V[2,0], ax.get_proj())
ax.annotate(
    "$\mathbf{v}_1$", 
    xy = (x2, y2), xytext = (120, 10), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(V[0,1],V[1,1],V[2,1], ax.get_proj())
ax.annotate(
    "$\mathbf{v}_2$", 
    xy = (x2, y2), xytext = (-30, 30), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

xx = array([0,yhat[0],yhat[0]])
yy = array([0,yhat[1],yhat[1]])
zz = array([0,0,2])

ax.add_collection3d( art3d.Poly3DCollection([zip(xx,yy,zz)],alpha=0.15,color='m') )
ax.set_title(r'The magenta sheet contains vectors with the same projection, $\mathbf{\hat{y}}$')

plt.show()

Thus, since the unique inverse does not exist, we can impose constraints on the solution to enforce a unique solution. For example, since there are many $\mathbf{y}$ that correspond to the same $\mathbf{\hat{y}}$, we can pick the one that has the shortest length, $||\mathbf{y} ||$. Thus, we can solve this inverse projection problem by enforcing additional constraints.

Consider the following constrained minimization problem:

$$ \min_y \mathbf{y}^T \mathbf{y}$$

subject to:

$$ \mathbf{v_1}^T \mathbf{y} = c_1$$ $$ \mathbf{v_2}^T \mathbf{y} = c_2$$

Here, we have the same setup as before: we want to minimize something with a set of constraints. We can re-write the constraints in a more familiar form as

$$ \mathbf{V}^T \mathbf{y} = \mathbf{c}$$

We can multiply both sides to obtain the even more familiar form:

$$ \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T \mathbf{y} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c} $$

which by cleaning up the notation gives,

$$ \mathbf{P}_V \mathbf{y} = \mathbf{\hat{y}} $$

where

$$ \mathbf{\hat{y}} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c}$$

and

$$ \mathbf{P}_V=\mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T $$

So far, nothing has really happened yet. We've just rewritten the constraints as a projection, but we still don't know how to find the $\mathbf{y}$ of minimum length. To do that, we turn to the Pythagorean relationship we pointed out earlier,

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + ||\boldsymbol{\epsilon}||^2 $$

where $\boldsymbol{\epsilon} = \mathbf{y} - \mathbf{\hat{y}}$. So, we have

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + || \mathbf{y} - \mathbf{\hat{y}}||^2 \ge 0$$

and since this is always non-negative, the only way to minimize $||\mathbf{y}||^2$ is to set

$$ \mathbf{y} = \mathbf{\hat{y}} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c} $$

which is the solution to our constrained minimization problem.

If that seems shockingly easy, remember that we have already done all the heavy lifting by solving the projection problem. Here, all we have done is re-phrased the same problem.

Let's see this in action using scipy.optimize for comparison. Here's the problem

$$ \min_y \mathbf{y}^T \mathbf{y}$$

subject to:

$$ \mathbf{e_1}^T \mathbf{y} = 1$$ $$ \mathbf{e_2}^T \mathbf{y} = 1$$

where $\mathbf{e}_i$ is the coordinate vector that is zero everywhere except for the $i^{th}$ entry.

In [30]:
import numpy as np
from scipy.optimize import minimize

# constraints formatted for scipy.optimize.minimize
cons = [{'type':'eq','fun':lambda x: x[0]-1,'jac':None},
        {'type':'eq','fun':lambda x: x[1]-1,'jac':None},
        ]

init_point = np.array([1,2,3,0]) # initial guess
ysol= minimize(lambda x: np.dot(x,x),init_point,constraints=cons,method='SLSQP')

# using projection method
c = np.matrix([[1,1,0,0]]).T     # RHS constraint vector
V = np.matrix(np.eye(4)[:,0:2])
ysol_p =  V*np.linalg.inv(V.T*V)*V.T*c

print 'scipy optimize solution:',
print ysol['x']
print 'projection solution:',
print np.array(ysol_p).flatten()

print np.allclose(np.array(ysol_p).flat,ysol['x'],atol=1e-6)
scipy optimize solution: [  1.00000000e+00   1.00000000e+00  -2.17476398e-08  -2.93995881e-08]
projection solution: [ 1.  1.  0.  0.]
True

Weighted Constrained Minimization

We can again pursue a more general problem using the same technique:

$$ \min_y \mathbf{y}^T \mathbf{Q} \mathbf{y} $$

subject to:

$$ \mathbf{V}^T \mathbf{y} = \mathbf{c}$$

where $\mathbf{Q}$ is a positive definite matrix. In this case, we can define $\eta$ such that:

$$ \mathbf{y} = \mathbf{Q}^{-1} \boldsymbol{\eta}$$

and re-write the constraint as

$$ \mathbf{V}^T \mathbf{Q}^{-1} \boldsymbol{\eta} = \mathbf{c}$$

and multiply both sides,

$$\mathbf{P}_V \boldsymbol{\eta}=\mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{c} $$

where

$$\mathbf{P}_V=\mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{V}^T \mathbf{Q}^{-1}$$

To sum up, the minimal $\mathbf{y}$ that solves this constrained minimization problem is

$$ \mathbf{y}_o = \mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{c}$$

Once again, let's illustrate this using scipy.optimize in the following

In [31]:
import numpy as np
from scipy.optimize import minimize

# constraints formatted for scipy.optimize.minimize
cons = [{'type':'eq','fun':lambda x: x[0]-1,'jac':None},
        {'type':'eq','fun':lambda x: x[1]-1,'jac':None},
        ]

Q = np.diag ([1,2,3,4])


init_point = np.array([1,2,3,0]) # initial guess
ysol= minimize(lambda x: np.dot(x,np.dot(Q,x)),init_point,constraints=cons,method='SLSQP')

# using projection method
Qinv = np.linalg.inv(Q)
c = np.matrix([[1,1,0,0]]).T     # RHS constraint vector
V = np.matrix(np.eye(4)[:,0:2])
ysol_p =  V*np.linalg.inv(V.T*Qinv*V)*V.T*Qinv*c
print 'scipy optimize solution:',
print ysol['x']
print 'projection solution:',
print np.array(ysol_p).flatten()
print np.allclose(np.array(ysol_p).flat,ysol['x'],atol=1e-5)
scipy optimize solution: [  1.00000000e+00   1.00000000e+00  -9.93437889e-09  -2.31626259e-06]
projection solution: [ 1.  1.  0.  0.]
True

Summary

In this section, we pulled yet another powerful result from the projection concept we developed previously. We showed how "inverse projection" can lead to the solution of the classic constrained minimization problem. Although there are many approaches to the same problem, by once again appealing to the power projection method, we can maintain our intuitive geometric intuition.

References

This post was created using the nbconvert utility from the source IPython Notebook which is available for download from the main github site for this blog. The projection concept is masterfully discussed in the classic Strang, G. (2003). Introduction to linear algebra. Wellesley Cambridge Pr. Also, some of Dr. Strang's excellent lectures are available on MIT Courseware. I highly recommend these as well as the book.

Monday, January 14, 2013

Conditional Expectation for Gaussian Variables

Introduction

By this point, we have developed many tools to deal with computing the conditional expectation. In this section, we discuss a bizarre and amazing coincidence regarding Gaussian random variables and linear projection, a coincidence that is the basis for most of statistical signal processing.

Conditional Expectation by Optimization

Now, let's consider the important case of the zero-mean bivariate Gaussian and try to find a function $h$ that minimizes the mean squared error (MSE). Again, trying to solve for the conditional expectation by minimizing the error over all possible functions $h$ is generally very, very hard. One alternative is to use parameters for the $h$ function and then just optimize over those. For example, we could assume that $h(Y)= \alpha Y$ and then use calculus to find the $\alpha$ parameter.

Let's try this with the zero-mean bivariate Gaussian density,

$$\mathbb{E}((X-\alpha Y )^2) = \mathbb{E}(\alpha^2 Y^2 - 2 \alpha X Y + X^2 )$$

and then differentiate this with respect to $\alpha$ to obtain

$$\mathbb{E}(2 \alpha Y^2 - 2 X Y ) = 2 \alpha \sigma_y^2 - 2 \mathbb{E}(XY) = 0$$

Then, solving for $\alpha$ gives

$$ \alpha = \frac{ \mathbb{E}(X Y)}{ \sigma_y^2 } $$

which means we that

\begin{equation} \mathbb{ E}(X|Y) \approx \alpha Y = \frac{ \mathbb{E}(X Y )}{ \sigma_Y^2 } Y =\frac{\sigma_{X Y}}{ \sigma_Y^2 } Y \end{equation}

where that last equality is just notation. Remember here we assumed a special linear form for $h=\alpha Y$, but we did that for convenience. We still don't know whether or not this is the one true $h_{opt}$ that minimizes the MSE for all such functions.

Conditional Expectation Using Direct Integration

Now, let's try this again by computing $ \mathbb{E}(X|Y)$ in the case of the bivariate Gaussian distribution straight from the definition.

\begin{equation} \mathbb{E}(X|Y) = \int_{\mathbb{ R}} x \frac{f_{X,Y}(x,y)}{f_Y(y)} dx \end{equation}

where

$$ f_{X,Y}(x,y) = \frac{1}{2\pi |\mathbf{R}|^{\frac{1}{2}}} e^{-\frac{1}{2} \mathbf{v}^T \mathbf{R}^{-1} \mathbf{v} } $$

and where

$$ \mathbf{v}= \left[ x,y \right]^T$$

$$ \mathbf{R} = \left[ \begin{array}{cc} \sigma_{x}^2 & \sigma_{xy} \\ \sigma_{xy} & \sigma_{y}^2 \\ \end{array} \right] $$

and with

\begin{eqnarray} \sigma_{xy} &=& \mathbb{E}(xy) \nonumber \\ \sigma_{x}^2 &=& \mathbb{E}(x^2) \nonumber \\ \sigma_{y}^2 &=& \mathbb{E}(y^2) \nonumber
\end{eqnarray}

This conditional expectation (Eq. 4 above) is a tough integral to evaluate, so we'll do it with sympy.

In [3]:
from sympy import Matrix, Symbol, exp, pi, simplify, integrate 
from sympy import stats

sigma_x = Symbol('sigma_x',positive=True)
sigma_y = Symbol('sigma_y',positive=True)
sigma_xy = Symbol('sigma_xy',real=True)
fyy = stats.density(stats.Normal('y',0,sigma_y))(y)
 
R = Matrix([[sigma_x**2, sigma_xy],
            [sigma_xy,sigma_y**2]])
fxy = 1/(2*pi)/sqrt(R.det()) * exp((-Matrix([[x,y]])*R.inv()* Matrix([[x],[y]]))[0,0]/2 )

fcond = simplify(fxy/fyy)

Unfortunately, sympy cannot immediately integrate this without some hints. So, we need to define a positive variable ($u$) and substitute it into the integration

In [7]:
u=Symbol('u',positive=True) # define positive variable

fcond2=fcond.subs(sigma_x**2*sigma_y**2-sigma_xy**2,u) # substitute as hint to integrate
g=simplify(integrate(fcond2*x,(x,-oo,oo))) # evaluate integral
gg=g.subs( u,sigma_x**2 *sigma_y**2 - sigma_xy**2 ) # substitute back in
use( gg, simplify,level=2) # simplify exponent term
Out[7]:
$$\frac{\sigma_{xy} y}{\sigma_{y}^{2}}$$

Thus, by direct integration using sympy, we found

$$ \mathbb{ E}(X|Y) = Y \frac{\sigma_{xy}}{\sigma_{y}^{2}} $$

and this matches the prior result we obtained by direct minimization by assuming that $\mathbb{E}(X|Y) = \alpha Y$ and then solving for the optimal $\alpha$!

The importance of this result cannot be understated: the one true and optimal $h_{opt}$ is a linear function of $Y$.

In other words, assuming a linear function, which made the direct search for an optimal $h(Y)$ merely convenient yields the optimal result! This is a general result that extends for all Gaussian problems. The link between linear functions and optimal estimation of Gaussian random variables is the most fundamental result in statistical signal processing! This fact is exploited in everything from optimal filter design to adaptive signal processing.

We can easily extend this result to non-zero mean problems by inserting the means in the right places as follows:

$$ \mathbb{ E}(X|Y) = \bar{X} + (Y-\bar{Y}) \frac{\sigma_{xy}}{\sigma_{y}^{2}} $$

where $\bar{X}$ is the mean of $X$ (same for $Y$).

Summary

In this section, we showed that the conditional expectation for Gaussian random variables is a linear function, which, by a bizarre coincidence, is also the easiest one to work with. This result is fundamental to all optimal linear filtering problems (e.g. Kalman filter) and is the basis of most of the theory of stochastic processes used in signal processing. Up to this point, we have worked hard to illustrate all of the concepts we will need to unify our understanding of this entire field and figured out multiple approaches to these kinds of problems, most of which are far more difficult to compute. Thus, it is indeed just plain lucky that the most powerful distribution is the easiest to compute as a conditional expectation because it is a linear function. We will come back to this same result again and again as we work our way through these greater concepts.

References

This post was created using the nbconvert utility from the source IPython Notebook which is available for download from the main github site for this blog.