Monday, February 11, 2013

Gauss Markov

Introduction

In this section, we consider the famous Gauss-Markov problem which will give us an opportunity to use all the material we have so far developed. The Gauss-Markov is the fundamental model for noisy parameter estimation because it estimates unobservable parameters given a noisy indirect measurement. Incarnations of the same model appear in all studies of Gaussian models. This case is an excellent opportunity to use everything we have so far learned about projection and conditional expectation.

Following Luenberger (1997), let's consider the following problem:

$$ \mathbf{y} = \mathbf{W} \boldsymbol{\beta} + \boldsymbol{\epsilon} $$

where $\mathbf{W}$ is a $ n \times m $ matrix, and $\mathbf{y}$ is a $n \times 1$ vector. Also, $\boldsymbol{\epsilon}$ is a $n$-dimensional random vector with zero-mean and covariance

$$ \mathbb{E}( \boldsymbol{\epsilon} \boldsymbol{\epsilon}^T) = \mathbf{Q}$$

Note that real systems usually provide a calibration mode where you can estimate $\mathbf{Q}$ so it's not fantastical to assume you have some knowledge of the noise statistics. The problem is to find a matrix $\mathbf{K}$ so that $ \boldsymbol{\hat{\beta}} = \mathbf{K} \mathbf{y}$ approximates $ \boldsymbol{\beta}$. Note that we only have knowledge of $\boldsymbol{\beta}$ via $ \mathbf{y}$ so we can't measure it directly. Further, note that $\mathbf{K} $ is a matrix, not a vector, so there are $m \times n$ entries to compute.

We can approach this problem the usual way by trying to solve the MMSE problem:

$$ \min_K \mathbb{E}(|| \boldsymbol{\hat{\beta}}- \boldsymbol{\beta} ||^2)$$

which we can write out as

$$ \min_K \mathbb{E}(|| \boldsymbol{\hat{\beta}}- \boldsymbol{\beta} ||^2) = \min_K\mathbb{E}(|| \mathbf{K}\mathbf{y}- \boldsymbol{\beta} ||^2) = \min_K\mathbb{E}(|| \mathbf{K}\mathbf{W}\mathbf{\boldsymbol{\beta}}+\mathbf{K}\boldsymbol{\epsilon}- \boldsymbol{\beta} ||^2)$$

and since $\boldsymbol{\epsilon}$ is the only random variable here, this simplifies to

$$\min_K || \mathbf{K}\mathbf{W}\mathbf{\boldsymbol{\beta}}- \boldsymbol{\beta} ||^2 + \mathbb{E}(||\mathbf{K}\boldsymbol{\epsilon} ||^2) $$

The next step is to compute

$\DeclareMathOperator{\Tr}{Trace}$ $$ \mathbb{E}(||\mathbf{K}\boldsymbol{\epsilon} ||^2) = \mathbb{E}(\boldsymbol{\epsilon}^T \mathbf{K}^T \mathbf{K}^T \boldsymbol{\epsilon})=\Tr(\mathbf{K \mathbb{E}(\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T) K}^T)=\Tr(\mathbf{K Q K}^T)$$

using the properties of the trace of a matrix. We can assemble everything as

$$ \min_K || \mathbf{K W} \boldsymbol{\beta} - \boldsymbol{\beta}||^2 + \Tr(\mathbf{K Q K}^T) $$

Now, if we were to solve this for $\mathbf{K}$, it would be a function of $ \boldsymbol{\beta}$, which is the same thing as saying that the estimator, $ \boldsymbol{\hat{\beta}}$, is a function of what we are trying to estimate, $ \boldsymbol{\beta}$, which makes no sense. However, writing this out tells us that if we had $\mathbf{K W}= \mathbf{I}$, then the first term vanishes and the problem simplifies to

$$ \min_K \Tr(\mathbf{K Q K}^T) $$

with

$$ \mathbf{KW} = \mathbf{I}$$

This requirement is the same as asserting that the estimator is unbiased,

$$ \mathbb{E}( \boldsymbol{\hat{\beta}}) = \mathbf{KW} \boldsymbol{\beta} = \boldsymbol{\beta} $$

To line this problem up with our earlier work, let's consider the $i^{th}$ column of $\mathbf{K}$, $\mathbf{k}_i$. Now, we can re-write the problem as

$$ \min_k (\mathbf{k}_i^T \mathbf{Q} \mathbf{k}_i) $$

with

$$ \mathbf{k}_i^T \mathbf{W} = \mathbf{e}_i$$

and from our previous work on contrained optimization, we know the solution to this:

$$ \mathbf{k}_i = \mathbf{Q}^{-1} \mathbf{W}(\mathbf{W}^T \mathbf{Q^{-1} W})^{-1}\mathbf{e}_i$$

Now all we have to do is stack these together for the general solution:

$$ \mathbf{K} = (\mathbf{W}^T \mathbf{Q^{-1} W})^{-1} \mathbf{W}^T\mathbf{Q}^{-1} $$

It's easy when you have all of the concepts lined up! For completeness, the covariance of the error is

$$ \mathbb{E}(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}) (\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})^T = \mathbb{E}(\mathbf{K} \boldsymbol{\epsilon} \boldsymbol{\epsilon}^T \mathbf{K}^T)=\mathbf{K}\mathbf{Q}\mathbf{K}^T =(\mathbf{W}^T \mathbf{Q}^{-1} \mathbf{W})^{-1}$$

The following simulation illustrates these results.

In [18]:
from mpl_toolkits.mplot3d import proj3d
from numpy.linalg import inv
import matplotlib.pyplot as plt
import numpy as np
from numpy import matrix, linalg, ones, array

Q = np.eye(3)*.1 # error covariance matrix
#Q[0,0]=1

beta = matrix(ones((2,1))) # this is what we are trying estimate
W = matrix([[1,2],
            [2,3],
            [1,1]])

ntrials = 50 
epsilon = np.random.multivariate_normal((0,0,0),Q,ntrials).T 
y=W*beta+epsilon

K=inv(W.T*inv(Q)*W)*matrix(W.T)*inv(Q) 
b=K*y #estimated beta from data

fig = plt.figure()
fig.set_size_inches([6,6])

# some convenience definitions for plotting
bb = array(b)
bm = bb.mean(1)
yy = array(y)
ax = fig.add_subplot(111, projection='3d')

ax.plot3D(yy[0,:],yy[1,:],yy[2,:],'mo',label='y',alpha=0.3)
ax.plot3D([beta[0,0],0],[beta[1,0],0],[0,0],'r-',label=r'$\beta$')
ax.plot3D([bm[0],0],[bm[1],0],[0,0],'g-',lw=1,label=r'$\hat{\beta}_m$')
ax.plot3D(bb[0,:],bb[1,:],0*bb[1,:],'.g',alpha=0.5,lw=3,label=r'$\hat{\beta}$')
ax.legend(loc=0,fontsize=18)
plt.show()

The figure above show the simulated $\mathbf{y}$ data as magenta circles. The green dots show the corresponding estimates, $\boldsymbol{\hat{\beta}}$ for each sample. The red and green lines show the true value of $\boldsymbol{\beta}$ versus the average of the estimated $\boldsymbol{\beta}$-values, $\boldsymbol{\hat{\beta_m}}$. The matrix $\mathbf{K}$ maps the magenta circles in the corresponding green dots. Note there are many possible ways to map the magenta circles to the plane, but the $\mathbf{K}$ is the ones that minimizes the MSE for $\boldsymbol{\beta}$.

The figure below shows more detail in the horizontal xy-plane above.

In [19]:
from  matplotlib.patches import Ellipse

fig, ax = subplots()
fig.set_size_inches((6,6))
ax.set_aspect(1)
ax.plot(bb[0,:],bb[1,:],'g.')
ax.plot([beta[0,0],0],[beta[1,0],0],'r--',label=r'$\beta$',lw=4.)
ax.plot([bm[0],0],[bm[1],0],'g-',lw=1,label=r'$\hat{\beta}_m$')
ax.legend(loc=0,fontsize=18)
ax.grid()

bm_cov = inv(W.T*Q*W)
U,S,V = linalg.svd(bm_cov) 

err = np.sqrt((matrix(bm))*(bm_cov)*(matrix(bm).T))
theta = arccos(U[0,1])/pi*180

ax.add_patch(Ellipse(bm,err*2/sqrt(S[0]),err*2/sqrt(S[1])
                       ,angle=theta,color='pink',alpha=0.5))

plt.show()

The figure above shows the green dots, which are individual estimates of $\boldsymbol{\hat{\beta}}$ from the corresponding simulated $\mathbf{y}$ data. The red dashed line is the true value for $\boldsymbol{\beta}$ and the green line ($\boldsymbol{\hat{\beta_m}}$ ) is the average of all the green dots. Note there is hardly a visual difference between them. The pink ellipse provides some scale as to the covariance of the estimated $\boldsymbol{\beta}$ values. I invite you to download this IPython notebook and tweak the values for the error covariance or any other parameter. All of the plots should update and scale correctly.

Summary

The Gauss-Markov problem is the cornerstone of all Gaussian modeling and is thus one of the most powerful models used in signal processing. We showed how to estimate the unobservable parameters given noisy measurements using our previous work on projection. We also coded up a short example to illustrate how this works in a simulation. For a much more detailed approach, see Luenberger (1997).

As usual, the corresponding IPython notebook for this post is available for download here.

Comments and corrections welcome!

References

  • Luenberger, David G. Optimization by vector space methods. Wiley-Interscience, 1997.

Monday, January 28, 2013

Inverse Projection as Constrained Optimization

Inverse Projection

In our previous discussion, we developed a powerful intuitive sense for projections and their relationships to minimum mean squared error problems. Here, we continue to extract yet another powerful result from the same concept. Recall that we learned to minimize

$$ J = || \mathbf{y} - \mathbf{V}\boldsymbol{\alpha} ||^2 $$

by projecting $\mathbf{y}$ onto the space characterized by $\mathbf{V}$ as follows:

$$ \mathbf{\hat{y}} = \mathbf{P}_V \mathbf{y}$$

where

$$ \mathbf{P}_V = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T$$

where the corresponding error ($\boldsymbol{\epsilon}$) comes directly from the Pythagorean theorem:

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + ||\boldsymbol{\epsilon}||^2 $$

Now, let's consider the inverse problem: given $\mathbf{\hat{y}}$, what is the corresponding $\mathbf{y}$? The first impulse in this situation is to re-visit

$$ \mathbf{\hat{y}} = \mathbf{P}_V \mathbf{y}$$

and see if you can somehow compute the inverse of $\mathbf{P}_V$. This will not work because the projection matrix does not possess a unique inverse. In the following figure, the vertical sheet represents all vectors in the space that have exactly the same projection, $\mathbf{\hat{y}}$. Thus, there is no unique solution to the inverse problem which is to say that it is "ill-posed".

In [29]:
#http://stackoverflow.com/questions/10374930/matplotlib-annotating-a-3d-scatter-plot

from mpl_toolkits.mplot3d import proj3d
import matplotlib.patches as patches
import mpl_toolkits.mplot3d.art3d as art3d
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
fig.set_size_inches([8,8])

ax = fig.add_subplot(111, projection='3d')

ax.set_aspect(1)
ax.set_xlim([0,2])
ax.set_ylim([0,2])
ax.set_zlim([0,2])
ax.set_aspect(1)
ax.set_xlabel('x-axis',fontsize=16)
ax.set_ylabel('y-axis',fontsize=16)
ax.set_zlabel('z-axis',fontsize=16)

y = matrix([1,1,1]).T 
V = matrix([[1,0.25], # columns are v_1, v_2
            [0,0.50],
            [0,0.00]])

alpha=inv(V.T*V)*V.T*y # optimal coefficients
P = V*inv(V.T*V)*V.T
yhat = P*y         # approximant


u = np.linspace(0, 2*np.pi, 100)
v = np.linspace(0, np.pi, 100)

xx = np.outer(np.cos(u), np.sin(v))
yy = np.outer(np.sin(u), np.sin(v))
zz = np.outer(np.ones(np.size(u)), np.cos(v))

sphere=ax.plot_surface(xx+y[0,0], yy+y[1,0], zz+y[2,0],  
                       rstride=4, cstride=4, color='gray',alpha=0.3,lw=0.25)

ax.plot3D([y[0,0],0],[y[1,0],0],[y[2,0],0],'r-',lw=3)
ax.plot3D([y[0,0]],[y[1,0]],[y[2,0]],'ro')

ax.plot3D([V[0,0],0],[V[1,0],0],[V[2,0],0],'b-',lw=3)
ax.plot3D([V[0,0]],[V[1,0]],[V[2,0]],'bo')
ax.plot3D([V[0,1],0],[V[1,1],0],[V[2,1],0],'b-',lw=3)
ax.plot3D([V[0,1]],[V[1,1]],[V[2,1]],'bo')

ax.plot3D([yhat[0,0],0],[yhat[1,0],0],[yhat[2,0],0],'g--',lw=3)
ax.plot3D([yhat[0,0]],[yhat[1,0]],[yhat[2,0]],'go')


x2, y2, _ = proj3d.proj_transform(y[0,0],y[1,0],y[2,0], ax.get_proj())
ax.annotate(
    "$\mathbf{y}$", 
    xy = (x2, y2), xytext = (40, 20), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(yhat[0,0],yhat[1,0],yhat[2,0], ax.get_proj())
ax.annotate(
    "$\hat{\mathbf{y}}$", 
    xy = (x2, y2), xytext = (40, 10), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(V[0,0],V[1,0],V[2,0], ax.get_proj())
ax.annotate(
    "$\mathbf{v}_1$", 
    xy = (x2, y2), xytext = (120, 10), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

x2, y2, _ = proj3d.proj_transform(V[0,1],V[1,1],V[2,1], ax.get_proj())
ax.annotate(
    "$\mathbf{v}_2$", 
    xy = (x2, y2), xytext = (-30, 30), fontsize=24,
    textcoords = 'offset points', ha = 'right', va = 'bottom',
    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

xx = array([0,yhat[0],yhat[0]])
yy = array([0,yhat[1],yhat[1]])
zz = array([0,0,2])

ax.add_collection3d( art3d.Poly3DCollection([zip(xx,yy,zz)],alpha=0.15,color='m') )
ax.set_title(r'The magenta sheet contains vectors with the same projection, $\mathbf{\hat{y}}$')

plt.show()

Thus, since the unique inverse does not exist, we can impose constraints on the solution to enforce a unique solution. For example, since there are many $\mathbf{y}$ that correspond to the same $\mathbf{\hat{y}}$, we can pick the one that has the shortest length, $||\mathbf{y} ||$. Thus, we can solve this inverse projection problem by enforcing additional constraints.

Consider the following constrained minimization problem:

$$ \min_y \mathbf{y}^T \mathbf{y}$$

subject to:

$$ \mathbf{v_1}^T \mathbf{y} = c_1$$ $$ \mathbf{v_2}^T \mathbf{y} = c_2$$

Here, we have the same setup as before: we want to minimize something with a set of constraints. We can re-write the constraints in a more familiar form as

$$ \mathbf{V}^T \mathbf{y} = \mathbf{c}$$

We can multiply both sides to obtain the even more familiar form:

$$ \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T \mathbf{y} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c} $$

which by cleaning up the notation gives,

$$ \mathbf{P}_V \mathbf{y} = \mathbf{\hat{y}} $$

where

$$ \mathbf{\hat{y}} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c}$$

and

$$ \mathbf{P}_V=\mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1} \mathbf{V}^T $$

So far, nothing has really happened yet. We've just rewritten the constraints as a projection, but we still don't know how to find the $\mathbf{y}$ of minimum length. To do that, we turn to the Pythagorean relationship we pointed out earlier,

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + ||\boldsymbol{\epsilon}||^2 $$

where $\boldsymbol{\epsilon} = \mathbf{y} - \mathbf{\hat{y}}$. So, we have

$$ ||\mathbf{y}||^2 = ||\mathbf{\hat{y}}||^2 + || \mathbf{y} - \mathbf{\hat{y}}||^2 \ge 0$$

and since this is always non-negative, the only way to minimize $||\mathbf{y}||^2$ is to set

$$ \mathbf{y} = \mathbf{\hat{y}} = \mathbf{V} (\mathbf{V}^T \mathbf{V})^{-1}\mathbf{c} $$

which is the solution to our constrained minimization problem.

If that seems shockingly easy, remember that we have already done all the heavy lifting by solving the projection problem. Here, all we have done is re-phrased the same problem.

Let's see this in action using scipy.optimize for comparison. Here's the problem

$$ \min_y \mathbf{y}^T \mathbf{y}$$

subject to:

$$ \mathbf{e_1}^T \mathbf{y} = 1$$ $$ \mathbf{e_2}^T \mathbf{y} = 1$$

where $\mathbf{e}_i$ is the coordinate vector that is zero everywhere except for the $i^{th}$ entry.

In [30]:
import numpy as np
from scipy.optimize import minimize

# constraints formatted for scipy.optimize.minimize
cons = [{'type':'eq','fun':lambda x: x[0]-1,'jac':None},
        {'type':'eq','fun':lambda x: x[1]-1,'jac':None},
        ]

init_point = np.array([1,2,3,0]) # initial guess
ysol= minimize(lambda x: np.dot(x,x),init_point,constraints=cons,method='SLSQP')

# using projection method
c = np.matrix([[1,1,0,0]]).T     # RHS constraint vector
V = np.matrix(np.eye(4)[:,0:2])
ysol_p =  V*np.linalg.inv(V.T*V)*V.T*c

print 'scipy optimize solution:',
print ysol['x']
print 'projection solution:',
print np.array(ysol_p).flatten()

print np.allclose(np.array(ysol_p).flat,ysol['x'],atol=1e-6)
scipy optimize solution: [  1.00000000e+00   1.00000000e+00  -2.17476398e-08  -2.93995881e-08]
projection solution: [ 1.  1.  0.  0.]
True

Weighted Constrained Minimization

We can again pursue a more general problem using the same technique:

$$ \min_y \mathbf{y}^T \mathbf{Q} \mathbf{y} $$

subject to:

$$ \mathbf{V}^T \mathbf{y} = \mathbf{c}$$

where $\mathbf{Q}$ is a positive definite matrix. In this case, we can define $\eta$ such that:

$$ \mathbf{y} = \mathbf{Q}^{-1} \boldsymbol{\eta}$$

and re-write the constraint as

$$ \mathbf{V}^T \mathbf{Q}^{-1} \boldsymbol{\eta} = \mathbf{c}$$

and multiply both sides,

$$\mathbf{P}_V \boldsymbol{\eta}=\mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{c} $$

where

$$\mathbf{P}_V=\mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{V}^T \mathbf{Q}^{-1}$$

To sum up, the minimal $\mathbf{y}$ that solves this constrained minimization problem is

$$ \mathbf{y}_o = \mathbf{V} ( \mathbf{V}^T \mathbf{Q^{-1} V})^{-1} \mathbf{c}$$

Once again, let's illustrate this using scipy.optimize in the following

In [31]:
import numpy as np
from scipy.optimize import minimize

# constraints formatted for scipy.optimize.minimize
cons = [{'type':'eq','fun':lambda x: x[0]-1,'jac':None},
        {'type':'eq','fun':lambda x: x[1]-1,'jac':None},
        ]

Q = np.diag ([1,2,3,4])


init_point = np.array([1,2,3,0]) # initial guess
ysol= minimize(lambda x: np.dot(x,np.dot(Q,x)),init_point,constraints=cons,method='SLSQP')

# using projection method
Qinv = np.linalg.inv(Q)
c = np.matrix([[1,1,0,0]]).T     # RHS constraint vector
V = np.matrix(np.eye(4)[:,0:2])
ysol_p =  V*np.linalg.inv(V.T*Qinv*V)*V.T*Qinv*c
print 'scipy optimize solution:',
print ysol['x']
print 'projection solution:',
print np.array(ysol_p).flatten()
print np.allclose(np.array(ysol_p).flat,ysol['x'],atol=1e-5)
scipy optimize solution: [  1.00000000e+00   1.00000000e+00  -9.93437889e-09  -2.31626259e-06]
projection solution: [ 1.  1.  0.  0.]
True

Summary

In this section, we pulled yet another powerful result from the projection concept we developed previously. We showed how "inverse projection" can lead to the solution of the classic constrained minimization problem. Although there are many approaches to the same problem, by once again appealing to the power projection method, we can maintain our intuitive geometric intuition.

References

This post was created using the nbconvert utility from the source IPython Notebook which is available for download from the main github site for this blog. The projection concept is masterfully discussed in the classic Strang, G. (2003). Introduction to linear algebra. Wellesley Cambridge Pr. Also, some of Dr. Strang's excellent lectures are available on MIT Courseware. I highly recommend these as well as the book.