Thursday, November 1, 2012

Expectation Maximization

Expectation Maximization (EM) is a powerful technique for creating maximum likelihood estimators when the variables are difficult to separate. in the following, we set up a Gaussian mixture experiment and derive the corresponding estimators using this technique.

Experiment: Measuring from Unseen Groups

Let's investigate the following experiment: You have two distinct groups and you can randomly pick an individual from each group (you don't know from which) and then measure that individual's height. Group a is normally distributed as

$$ \mathcal{N}_a(x) =\mathcal{N}(x; \mu_a,\sigma) $$

and likewise for group b

$$ \mathcal{N}_b(x) =\mathcal{N}(x; \mu_b,\sigma) $$

Note that the standard deviation, $\sigma$ is the same for both groups, but the means ($\mu_a,\mu_b$) are different. The problem is to estimate the means given that you can't directly know which group you are picking from.

Then we can write the joint density for this experiment as the following:

$$ f_{\mu_a,\mu_b}(x,z)= \frac{1}{2} \mathcal{N}_a(x) ^z \mathcal{N}_b(x) ^{1-z} $$

where $z=1$ if we pick from group a and $z=0$ for group b. Note that the $1/2$ comes from the 50/50 chance of picking either group. Unfortunately, since we do not measure the $z$ variable, we have to integrate it out of our density function to account for this handicap. Thus,

$$ f_{\mu_a,\mu_b}(x)= \frac{1}{2} \mathcal{N}_a(x)+\frac{1}{2} \mathcal{N}_b(x)$$

Now, since $n$ trials are independent, we can write out the likelihood:

$$ \mathcal{L}(\mu_a,\mu_b|\mathbf{x})= \prod_{i=1}^n f_{\mu_a,\mu_b}(x_i)$$

This is basically notation. We have just substituted everything into $ f_{\mu_a,\mu_b}(x)$ under the independent-trials assumption. Recal that the independent trials assumptions means that the joint probability is just the product of the individual probabilities. The idea of maximum likelihood is to maximize this as the function of $\mu_a$ and $\mu_b$ after plugging in all of the $x_i$ data. The problem is we don't know which group we are measuring at each trial so this is trickier than just estimating the parameters for each group separately.

Simulating the Experiment

We need the following code to setup the experiment of randomly a group and then picking an individual from that group.

In [1]:

from __future__ import division
import numpy as np
from scipy.stats import bernoulli, norm
#np.random.seed(101) # set random seed for reproducibility
mua_true=4 # we are trying to estimate this from the data
mub_true=7 # we are trying to estimate this from the data
fa=norm(mua_true,1) # distribution for group A
fb=norm(mub_true,1) # distribution for group B
fz=bernoulli(0.5) # each group equally likely 

def sample(n=10):
    'simulate picking from each group n times'
    tmp=fz.rvs(n) # choose n of the coins, A or B
    return tmp*(fb.rvs(n))+(1-tmp)*fa.rvs(n) # flip it n times

xs = sample(1000) # generate some samples

Here's a quick look at the density functions of each group and a histogram of the samples

In [12]:

f,ax = subplots()
x = linspace(mua_true-2,mub_true+2,100)
ax.plot(x,fa.pdf(x),label='group A')
ax.plot(x,fb.pdf(x),label='group B')
ax.hist(xs,bins=50,normed=1,label='Samples');
ax.legend(loc=0);

Just from looking at this plot, we suspect that we will have to reconcile the samples in the overlap region since these could have come from either group. This is where the Expectation Maximization algorithm enters.

Expectation maximization

The key idea of expectation maximization is that we can somehow pretend we know the unobservable $z$ value and the proceed with the usual maximum likelihood estimation process.

The idea behind expectation-maximization is that we want to use a maximum likelihood estimate (this is the maximization part of the algorithm) after computing the expectation over the missing variable (in this case, $z$).

The following code uses sympy to setup the functions symbolically and convert them to numpy functions that we can quickly evaluate. Because it's easier and more stable to evaluate, we will work with the log of the likelihood function. It is useful to keep track of the incomplete log-likelihood ($\log\mathcal{L}$) since it can be proved that it is monotone increasing and good way to identify coding errors. Recall that this was the likelihood in the case where we integrated out the $z$ variable to reconcile as its absence.

In [3]:

import sympy
from sympy.abc import x, z
from sympy import stats

mu_a,mu_b = sympy.symbols('mu_a,mu_b')
na=stats.Normal( 'x', mu_a,1)
nb=stats.Normal( 'x', mu_b,1)

L=(stats.density(na)(x)+stats.density(nb)(x))/2 # incomplete likelihood function

Next, we need to compute the expectation step. To avoid notational overload, we will just use $\Theta$ to denote the $\mu_b$ and $\mu_a$ parameters and the data $x_i$. This means that the density function of $z$ and $\Theta$ can be written as the following:

$$ \mathbb{P}(z,\Theta) = \frac{1}{2} \mathcal{N}_a(\Theta) ^ z \mathcal{N}_b(\Theta) ^ {(1-z)} $$

For the expectation part we have to compute $\mathbb{E}(z|\Theta)$ but since $z\in \lbrace 0,1 \rbrace$, this simplifies easily

$$ \mathbb{E}(z|\Theta) = 1 \cdot \mathbb{P}(z=1|\Theta) + 0 \cdot \mathbb{P}(z=0|\Theta) = \mathbb{P}(z=1|\Theta) $$

Now, the only thing left is to find $ \mathbb{P}(z=1|\Theta) $ which we can do using Bayes rule:

$$ \mathbb{P}(z=1|\Theta) = \frac{ \mathbb{P}(\Theta|z=1)\mathbb{P}(z=1)}{\mathbb{P}(\Theta)} $$

The term in the denominator comes from summing (integrating) out the $z$ items in the full joint density $ \mathbb{P}(z,\Theta) $

$$ \mathbb{P}(\Theta) = (\mathcal{N}_a(\Theta) + \mathcal{N}_b(\Theta))\frac{1}{2} $$

and since $\mathbb{P}(z=1)=1/2$, we finally obtain

$$ \mathbb{E}(z|\Theta) =\mathbb{P}(z=1|\Theta) = \frac{\mathcal{N}_a(\Theta)}{\mathcal{N}_a(\Theta) + \mathcal{N}_b(\Theta)} $$

and which is coded below.

In [4]:

def ez(x,mu_a,mu_b): # expected value of hidden variable
  return norm(mu_a).pdf(x)/(norm(mu_a).pdf(x)+ norm(mu_b).pdf(x))

Now, given we have this estimate for $z_i$, $\hat{z}_i=\mathbb{E(z|\Theta_i)}$, we can go back and compute the log likelihood estimate of

$$ J= \log\prod_{i=1}^n \mathbb{P}(\hat{z}_i,\Theta_i) = \sum_{i=1}^n \hat{z}_i\log \mathcal{N}_a(\Theta_i) +(1-\hat{z}_i)\log \mathcal{N}_b(\Theta_i) +\log(1/2) $$

by maximizing it using basic calculus. The trick is to remember that $\hat{z}_i$ is fixed, so we only have to maximize the $\log$ parts. This leads to

$$ \hat{\mu}_a = \frac{\sum_{i=1}^n \hat{z}_i x_i}{\sum_{i=1}^n \hat{z}_i } $$

and for $\mu_b$

$$ \hat{\mu}_b = \frac{\sum_{i=1}^n (1-\hat{z}_i) x_i}{\sum_{i=1}^n 1-\hat{z}_i } $$

Now, we finally have the maximization step ( above ) and the expectation step ($\hat{z}_i$) from earlier. We're ready to simulate the algorithm and plot its performance!

In [10]:

out=[];lout=[] # containers for outputs

Lf=sympy.lambdify((x,mu_a,mu_b), sympy.log(abs(L)),'numpy') # convert to numpy function from sympy

mu_a_n=2 # initial point
mu_b_n=1 # initial point
niter=10 #

for i in range(niter):
    tau=ez(xs,mu_a_n,mu_b_n) # expected value of z-variable
    lout.append( sum(Lf(xs,mu_a_n,mu_b_n))) # track incomplete likelihood value (should be monotone)
    out.append((mu_a_n,mu_b_n))          # keep track of (pa,pb) steps
    mu_a_n=(sum(tau*xs)/sum(tau))    # new maximum  likelihood estimate of pa
    mu_b_n=(sum((1-tau)*xs)/sum(1-tau)) 
  
fig=figure()
fig.set_figwidth(12)
ax=fig.add_subplot(121)
ax.plot(array(out),'o-')
ax.legend(('mu_a','mu_b'),loc=0)
ax.hlines([mua_true,mub_true],0,len(out),['b','g'])
ax.set_xlabel('iteration',fontsize=18)
ax.set_ylabel('$\mu_a,\mu_b$ values',fontsize=24)
ax=fig.add_subplot(122)
ax.plot(array(lout),'o-')
ax.set_xlabel('iteration',fontsize=18)
ax.set_title('Incomplete likelihood',fontsize=16)

Out [10]:

<matplotlib.text.Text at 0x5c32cb0>

The figure on the left shows the estimates for both $\mu_a$ and $\mu_b$ for each iteration and the figure on the right shows the corresponding incomplete likelihood function. The horizontal lines on the left-figure show the true values we are trying to estimate. Notice the EM algorithm converges very quickly, but because each group is equally likely to be chosen, the algorithm cannot distinguish one from the other. The code below constructs a error surface to see this effect. The incomplete likelihood function is monotone which tells us that we have not made a coding error. We're omitting the proof of this monotonicity.

In [6]:

mua_step=linspace(0,10,30)
mub_step=linspace(0,10,20)
z=Lf(xs,mua_step[:,None],mub_step[:,None,None]).sum(axis=2) # numpy broadcasting
fig=figure()
ax=fig.add_subplot(111)
p=ax.contourf(mua_step,mub_step,z,30,cmap=cm.gray)
xa,xb=zip(*out) # unpack the container from the previous block
ax.plot(xa,xb,'ro',mua_true,mub_true,'bs') # true values in blue
ax.plot(xa[0],xb[0],'gx',ms=15.,mew=2.) # starting point in green
ax.text(xa[0],xb[0],'start',color='g',fontsize=11.) # points per iteration in red
ax.set_xlabel('$\mu_a$',fontsize=24)
ax.set_ylabel('$\mu_b$',fontsize=24)
ax.set_title('Incomplete Likelihood',fontsize=18)
fig.colorbar(p);

The figure shows the incomplete likelihood function that the algorithm is exploring. Note that the algorithm can get to the maximizer but since the surface has symmetric maxima, it has no way to pick between them and ultimately just picks the one that is closest to the starting point. This is because each group is equally likely to be chosen. I urge you to download this notebook and try different initial points and see where the maximizer winds up.

Summary

Expectation maximization is a powerful algorithm that is especially useful when it is difficult to de-couple the variables involved in a standard maximum likelihood estimation. Note that convergence to the "correct" maxima is not guaranteed, as we observed here. This is even more pronounced when there are more parameters to estimate. There is a nice applet you can use to investigate this effect and a much more detailed mathematical derivation here.

As usual, the IPython notebook corresponding to this post can be found here. I urge you to try these calculations on your own. Try changing the sample size and making the choice between the two groups no longer equal to 1/2 (equally likely).

Note you will need at least sympy version 0.7.2 to run this notebook.

Comments appreciated!

Wednesday, October 24, 2012

Maximum Likelihood Estimation

Maximum likelihood estimation is one of the key techniques employed in statistical signal processing for a wide variety of applications from signal detection to parameter estimation. In the following, we consider a simple experiment and work through the details of maximum likelihood estimation to ensure that we understand the concept in one of its simplest applications.

Click here to see the math. Blogger broke it somehow.

Setting up the Coin Flipping Experiment

Suppose we have coin and want to estimate the probability of heads ($p$) for it. The coin is Bernoulli distributed:
$$ \phi(x)= p^x (1-p)^{(1-x)} $$
where $x$ is the outcome, 1 for heads and 0 for tails. The $n$ independent flips, we have the likelihood:
$$ \mathcal{L}(p|\mathbf{x})= \prod_{i=1}^n p^{ x_i }(1-p)^{1-x_i} $$
This is basically notation. We have just substituted everything into $ \phi(x)$ under the independent-trials assumption.
The idea of maximum likelihood is to maximize this as the function of $p$ after plugging in all of the $x_i$ data. This means that our estimator, $\hat{p}$ , is a function of the observed $x_i$ data, and as such, is a random variable with its own distribution.

Simulating the Experiment

We need the following code to simulate coin flipping.

In [32]:

from __future__ import division
from scipy.stats import bernoulli 
import numpy as np

p_true=1/2 # this is the value we will try to estimate from the observed data
fp=bernoulli(p_true)

def sample(n=10):
    'simulate coin flipping'
    return fp.rvs(n)# flip it n times

xs = sample(100) # generate some samples

Now, we can write out the likelihood function using sympy

In [33]:

import sympy
from sympy.abc import x, z
p=sympy.symbols('p',positive=True)

L=p**x*(1-p)**(1-x)
J=np.prod([L.subs(x,i) for i in xs]) # objective function to maximize

Below, we find the maximum using basic calculus. Note that taking the log of $J$ makes the maximization problem tractable but doesn't change the extrema.

In [34]:

logJ=sympy.expand_log(sympy.log(J))
sol=sympy.solve(sympy.diff(logJ,p),p)[0]

x=linspace(0,1,100)
plot(x,map(sympy.lambdify(p,logJ,'numpy'),x),sol,logJ.subs(p,sol),'o',
                                          p_true,logJ.subs(p,p_true),'s',)
xlabel('$p$',fontsize=18)
ylabel('Likelihood',fontsize=18)
title('Estimate not equal to true value',fontsize=18)

Out [34]:

<matplotlib.text.Text at 0x8fc2d30>

Note that our estimator $\hat{p}$ (red circle) is not equal to the true value of $p$ (green square), but it is at the maximum of the likelihood function. This may sound disturbing, but keep in mind this estimate is a function of the random data; and since that data can change, the ultimate estimate can likewise change. I invite you to run this notebook a few times to observe this. Remember that the estimator is a function of the data and is thus also a random variable, just like the data is.
Let's write some code to empirically examine the behavior of the maximum likelihood estimator using a simulation of multiple trials. All we're doing here is combining the last few blocks of code.

In [35]:

def estimator_gen(niter=10,ns=100):
    'generate data to estimate distribution of maximum likelihood estimator'
    out=[]
    x=sympy.symbols('x',real=True)
    L=   p**x*(1-p)**(1-x)
    for i in range(niter):
        xs = sample(ns) # generate some samples from the experiment
        J=np.prod([L.subs(x,i) for i in xs]) # objective function to maximize
        logJ=sympy.expand_log(sympy.log(J)) 
        sol=sympy.solve(sympy.diff(logJ,p),p)[0]
        out.append(float(sol.evalf()))
    return out if len(out)>1 else out[0] # return scalar if list contains only 1 term
    
etries = estimator_gen(100) # this may take awhile, depending on how much data you want to generate
hist(etries) # histogram of maximum likelihood estimator
title('$\mu=%3.3f,\sigma=%3.3f$'%(mean(etries),std(etries)),fontsize=18)

Out [35]:

<matplotlib.text.Text at 0x8b90ad0>

Note that the mean of the estimator ($\mu$) is pretty close to the true value, but looks can be deceiving. The only way to know for sure is to check if the estimator is unbiased, namely, if
$$ \mathbb{E}(\hat{p}) = p $$

Because this problem is simple, we can solve for this in general noting that since $x=0$ or $x=1$, the terms in the product of $\mathcal{L}$ above are either $p$, if $x_i=1$ or $1-p$ if $x_i=0$. This means that we can write
$$ \mathcal{L}(p|\mathbf{x})= p^{\sum_{i=1}^n x_i}(1-p)^{n-\sum_{i=1}^n x_i} $$
with corresponding log as
$$ J=\log(\mathcal{L}(p|\mathbf{x})) = \log(p) \sum_{i=1}^n x_i + \log(1-p) \left(n-\sum_{i=1}^n x_i\right)$$
Taking the derivative of this gives:
$$ \frac{dJ}{dp} = \frac{1}{p}\sum_{i=1}^n x_i + \frac{(n-\sum_{i=1}^n x_i)}{p-1} $$
and solving this leads to
$$ \hat{p} = \frac{1}{ n} \sum_{i=1}^n x_i $$
This is our estimator for $p$. Up til now, we have been using sympy to solve for this based on the data $x_i$ but now we have it generally and don't have to solve for it again. To check if this estimator is biased, we compute its expectation:
$$ \mathbb{E}\left(\hat{p}\right) =\frac{1}{n}\sum_i^n \mathbb{E}(x_i) = \frac{1}{n} n \mathbb{E}(x_i) $$
by linearity of the expectation and where
$$\mathbb{E}(x_i) = p$$
Therefore,
$$ \mathbb{E}\left(\hat{p}\right) =p $$
This means that the esimator is unbiased. This is good news. We almost always want our estimators to be unbiased. Similarly,
$$ \mathbb{E}\left(\hat{p}^2\right) = \frac{1}{n^2} \mathbb{E}\left[\left( \sum_{i=1}^n x_i \right)^2 \right]$$
and where
$$ \mathbb{E}\left(x_i^2\right) =p$$
and by the independence assumption,
$$ \mathbb{E}\left(x_i x_j\right) =\mathbb{E}(x_i)\mathbb{E}( x_j) =p^2$$
Thus,
$$ \mathbb{E}\left(\hat{p}^2\right) =\left(\frac{1}{n^2}\right) n \left[ p+(n-1)p^2 \right] $$
So, the variance of the estimator, $\hat{p}$ is the following:
$$ \sigma_\hat{p}^2 = \mathbb{E}\left(\hat{p}^2\right)- \mathbb{E}\left(\hat{p}\right)^2 = \frac{p(1-p)}{n} $$
Note that the $n$ in the denominator means that the variance asymptotically goes to zero as $n$ increases (i.e. we consider more and more samples). This is good news also because it means that more and more coin flips leads to a better estimate of the underlying $p$.
Unfortunately, this formula for the variance is practically useless because we have to know $p$ to compute it and $p$ is the parameter we are trying to estimate in the first place! But, looking at $ \sigma_\hat{p}^2 $, we can immediately notice that if $p=0$, then there is no estimator variance because the outcomes are guaranteed to be tails. Also, the maximum of this variance, for whatever $n$, happens at $p=1/2$. This is our worst case scenario and the only way to compensate is with more samples (i.e. larger $n$).

All we have computed is the mean and variance of the estimator. In general, this is insufficient to characterize the underlying probability density of $\hat{p}$, except if we somehow knew that $\hat{p}$ were normally distributed. This is where the powerful central limit theorem comes in. The form of the estimator, which is just a mean estimator, implies that we can apply this theorem and conclude that $\hat{p}$ is normally distributed. However, there's a wrinkle here: the theorem tells us that $\hat{p}$ is asymptotically normal, it doesn't quantify how many samples $n$ we need to approach this asymptotic paradise. In our simulation this is no problem since we can generate as much data as we like, but in the real world, with a costly experiment, each sample may be precious. In the following, we won't apply this theorem and instead proceed analytically.

Probability Density for the Estimator

To write out the full density for $\hat{p}$, we first have to ask what is the probability that the estimator will equal a specific value and the tally up all the ways that could happen with their corresponding probabilities. For example, what is the probability that
$$ \hat{p} = \frac{1}{n}\sum_{i=1}^n x_i = 0 $$
This can only happen one way: when $x_i=0 \hspace{0.5em} \forall i$. The probability of this happening can be computed from the density
$$ f(\mathbf{x},p)= \prod_{i=1}^n \left(p^{x_i} (1-p)^{1-x_i} \right) $$
$$ f\left(\sum_{i=1}^n x_i = 0,p\right)= \left(1-p\right)^n $$
Likewise, if $\lbrace x_i \rbrace$ has one $i^{th}$ value equal to one, then
$$ f\left(\sum_{i=1}^n x_i = 1,p\right)= n p \prod_{i=1}^{n-1} \left(1-p\right)$$
where the $n$ comes from the $n$ ways to pick one value equal to one from the $n$ elements $x_i$. Continuing this way, we can construct the entire density as
$$ f\left(\sum_{i=1}^n x_i = k,p\right)= \binom{n}{k} p^k (1-p)^{n-k} $$
where the term on the left is the binomial coefficient of $n$ things taken $k$ at a time. This is the binomial distribution and it's not the density for $\hat{p}$, but rather for $n\hat{p}$. We'll leave this as-is because it's easier to work with below. We just have to remember to keep track of the $n$ factor.

Confidence Intervals

Now that we have the full density for $\hat{p}$, we are ready to ask some meaningful questions. For example,
$$ \mathbb{P}\left( | \hat{p}-p | \le \epsilon p \right) $$
Or, in words, what is the probability we can get within $\epsilon$ percent of the true value of $p$. Rewriting,
$$ \mathbb{P}\left( p - \epsilon p \lt \hat{p} \lt p + \epsilon p \right) = \mathbb{P}\left( n p - n \epsilon p \lt \sum_{i=1}^n x_i \lt n p + n \epsilon p \right)$$
Let's plug in some live numbers here for our worst case scenario where $p=1/2$. Then, if $\epsilon = 1/100$, we have
$$ \mathbb{P}\left( \frac{99 n}{100} \lt \sum_{i=1}^n x_i \lt \frac{101 n}{100} \right)$$
Since the sum in integer-valued, we need $n> 100$ to even compute this. Thus, if $n=101$ we have
$$ \mathbb{P}\left( \frac{9999}{200} \lt \sum_{i=1}^{101} x_i \lt \frac{10201}{200} \right) = f\left(\sum_{i=1}^{101} x_i = 50,p\right)= \binom{101}{50} (1/2)^{50} (1-1/2)^{101-50} = 0.079$$
This means that in the worst-case scenario for $p=1/2$, given $n=101$ trials, we will only get within 1% of the actual $p=1/2$ about 8% of the time. If you feel disappointed, that only means you've been paying attention. What if the coin was really heavy and it was costly to repeat this 101 times? Then, we would be within 1% of the actual value only 8% of the time. Those odds are terrible.
Let's come at this another way: given I could only flip the coin 100 times, how close could I come to the true underlying value with high probability (say, 95%)? In this case we are seeking to solve for $\epsilon$. Plugging in gives,
$$ \mathbb{P}\left( 50 - 50 \epsilon \lt \sum_{i=1}^{100} x_i \lt 50 + 50 \epsilon \right) = 0.95$$
which we have to solve for $\epsilon$. Fortunately, all the tools we need to solve for this are already in scipy.

In [36]:

import scipy.stats

b=scipy.stats.binom(100,.5) # n=100, p = 0.5, distribution of the estimator \hat{p}

f,ax= subplots()
ax.stem(arange(0,101),b.pmf(arange(0,101))) # heres the density of the sum of x_i

g = lambda i:b.pmf(arange(-i,i)+50).sum() # symmetric sum the probability around the mean
print 'this is pretty close to 0.95:%r'%g(10)
ax.vlines( [50+10,50-10],0 ,ax.get_ylim()[1] ,color='r',lw=3.)

this is pretty close to 0.95:0.95395593307064808

Out [36]:

<matplotlib.collections.LineCollection at 0x93d9570>

The two vertical lines in the plot show how far out from the mean we have to go to accumulate 95% of the probability. Now, we can solve this as
$$ 50 + 50 \epsilon = 60 $$
which makes $\epsilon=1/5$ or 20%. So, flipping 100 times means I can only get within 20% of the real $p$ 95% of the time in the worst case scenario (i.e. $p=1/2$).

In [37]:

b=scipy.stats.bernoulli(.5) # coin distribution
xs = b.rvs(100) # flip it 100 times
phat = mean(xs) # estimated p

print abs(phat-0.5) < 0.5*0.20 # did I make it w/in interval 95% of the time?

True

Let's keep doing this and see if we can get within this interval 95% of the time.

In [38]:

out=[]
b=scipy.stats.bernoulli(.5) # coin distribution
for i in range(500): # number of tries
    xs = b.rvs(100) # flip it 100 times
    phat = mean(xs) # estimated p
    out.append(abs(phat-0.5) < 0.5*0.20 ) # within 20% 

print 'Percentage of tries within 20 interval = %3.2f'%(100*sum(out)/float(len(out) ))

Percentage of tries within 20 interval = 96.20

Well, that seems to work. Now we have a way to get at the quality of the estimator, $\hat{p}$.

Summary

In this section, we explored the concept of maximum likelihood estimation using a coin flipping experiment both analytically and numerically with the scientific Python tool chain. There are two key points to remember. First, maximum likelihood estimation produces a function of the data that is itself a random variable, with its own statistics and distribution. Second, it's worth considering how to analytically derive the density function of the estimator rather than relying on canned packages to compute confidence intervals wherever possible. This is especially true when data is hard to come by and the approximations made in the central limit theorem are therefore harder to justify.

References

This IPython notebook is available for download. I urge you to experiment with the calculations for different parameters. As always, corrections and comments are welcome!

Python for Signal Processing

Thursday, November 1, 2012

Expectation Maximization

Experiment: Measuring from Unseen Groups

Simulating the Experiment

Expectation maximization

Summary

Wednesday, October 24, 2012

Maximum Likelihood Estimation

Setting up the Coin Flipping Experiment

Simulating the Experiment

Probability Density for the Estimator

Confidence Intervals

Summary

References

Blog Table of Contents

About Me

Python for Signal Processing Book