Monte Carlo Methods#

Monte carlo methods are a class of methods that use randomness to make it easier to solve deterministic problems. Specifally, we wish to compute integrals of the form $f(x)p(x)dxwherep$ is a probability density, i.e. a non-negative “function” that integrates to 1.

The idea of Monte-Carlo, is to observe that, if X is a random variable distributed according to p, then $E[f(X)]=f(x)p(x)dxWecanthenestimatetheexpectationusingthelawoflargenumbers.WesupposethatX_1,;X_2,;X_3,;\ldotsareallrandomvariablesdistributedaccordingtop.Thenthelawoflargenumbersgivesthat1ni=1nf(Xi)E[f(X1)]=f(x)p(x)dx.$

A classical example of this, is to compute the area of the unit circle (π) $π=D11dx=[1,1]×[1,1]χD1(x)dx=4E[χD1(U[1,1]×[1,1])],where\chi_{D_1}$ is the indicator function of the unit disk.

import numpy as np
import matplotlib.pyplot as plt
import scipy
n = 1000;

xs = 2*np.random.rand(2,n)-1
plt.scatter(xs[0,:],xs[1,:])

ids = np.linalg.norm(xs,axis=0)<1
plt.scatter(xs[0,ids],xs[1,ids])

fn = 4*sum(ids)/n
'estimate = {}'.format(fn)
'error = {}'.format(fn-np.pi)
'error = -0.013592653589793002'
../_images/d8c5998b217a27429587923686d117af9bee8877a16d017f779902e13c404bdd.png

Let’s look at how this converges with n

ns = np.int32(np.logspace(0,6,100))
errs = []
for n in ns:
    xs = 2*np.random.rand(2,n)-1
    ids = np.linalg.norm(xs,axis=0)<1
    errs.append((4*sum(ids)/n)-np.pi)
    
plt.loglog(ns,np.abs(errs),'o-')
plt.loglog(ns,1/np.sqrt(ns),'-')
plt.xlabel('n')
plt.ylabel('Error')
plt.show()
../_images/be7f3e66af5df9d01ff950b677c9815b5d14109384d2b6420db47e86f4901e6b.png

Let’s look at why the error is decaying like 1n. Let’s let the estimator $f^n=1nnf(Xi).Thevarianceof\hat{f}_nisE[(f^nE^[f(X)])2]=1n2E[(i=1nf(Xi)E[f^(X)])2]=1n2i,jE[(f(Xi)E[f^(X)])(f(Xj)E^[f(X)])].Ifwerearrangethisabit,wegetE[(f^nE^[f(X)])2]=1n2(iE[(f(X1)E[f^(X)])2]+ijCov(f(Xi),(Xj)))SincetheX_isareindependent,\text{Cov}(f(X_i),(X_j))=0welli\neq j.E[(f^nE^[f(X)])2]=1nE[(f(X1)E[f^(X)])2].$

If we take square root $E[(f^nE^[f(X)])2]=1nE[(f(X1)E[f^(X)])2],weseethattheaverageerrorwilldecaylike\frac{1}{\sqrt{n}}$.

This one over square root n error is universal in Monte Carlo. It is both the advantage and the curse of Monte Carlo.

Suppose we are trying to compute an integral in d dimensions using a pth order rule using n points. The error is $O(ndp)=O(np/d),whichwillbeveryslowwhendislarge.ThiswillbealotsmallerthantheerrorofMonteCarlowhend>2pO(n1/2).$

Importance sampling#

Suppose we want to compute the integral of a highly concentrated function $e104x2dx=π104Weshalltruncatetotheinterval[-1,1]$, since the function is very small there.

f = lambda x: np.exp(-1e4*x*x)
xs = np.linspace(-1,1,1000)
plt.plot(xs,f(xs))
plt.show()


true_int = np.sqrt(np.pi)/100

ns = np.int32(np.logspace(0,5,20))
ntrial = 10
errs = []
for n in ns:
    errs_samp = np.zeros(ntrial)
    for k in range(ntrial):
        xs = 2*np.random.rand(1,n)-1
        errs_samp[k] = ((2*np.sum(f(xs))/n)-true_int)/true_int
        
    errs.append(np.mean(errs_samp))
    
plt.loglog(ns,np.abs(errs),'o-')
plt.loglog(ns,1/np.sqrt(ns),'-')
plt.xlabel('n')
plt.ylabel('Relative error')
plt.show()
../_images/aae76723ab2d55c4e708b4b1cb0109aee0d1c1ffddb20264d61181c91ff0e832.png ../_images/8cb1c26c3c7b01dea8b0f1ac2c87d03bc7aefc57b500babb3706ed3adf74da0d.png

We can increase the accuracy, we should use Monte Carlo samples where f is large. If the samples Xi are drawn from the pobability distribution p, then we compensate for this by $f(x)dx=f(x)p(x)p(x)dx1nif(xi)p(xi)$

We shall choose p to be the density for a normal distrubtion: p(x)=12πσ2exp(x22σ2)

for sigma in [1e-2,1e-1, 1]:
    p = lambda x: np.exp(-x*x/2/sigma/sigma)/np.sqrt(2*np.pi*sigma**2)
    errs_import = []
    for n in ns:
        errs_samp = np.zeros(ntrial)
        for k in range(ntrial):
            xs = sigma * np.random.randn(1,n)
            est = np.sum(f(xs)/p(xs)) / n
            errs_samp[k] = (est-true_int)/true_int
        errs_import.append(np.mean(errs_samp))

    plt.loglog(ns,np.abs(errs_import),'o-',label='sigma={}'.format(sigma))

plt.loglog(ns,np.abs(errs),'*--',label='uniform')    
plt.loglog(ns,1/np.sqrt(ns),':')
plt.xlabel('n')
plt.legend()
plt.ylabel('Relative error')
plt.show()
../_images/5686946dc7fcdd113787ba947486b60f63771c15fdc2698406082a003c40685a.png

We see that concentrating the samples at the peak of the function significantly decreases the error. This method is called “importance sampling”.

Markov Chain Monte Carlo (MCMC)#

Sometimes we wish to compute $f(x)p(x)dxbutwecantdirectlysamplefromp.TheideawillbetochooseoursamplesX_itobesamplesfromaMarkovchain.i.e.X_{i+1}israndomlygeneratedfromX_i.Ifwerunitlongenough,thentheX_i$’s will be distributed according to the stationary distribution of the Markov chain.

Our task is therefore to design a Markov chain whose stationary distribution is p.

Metropolis-Hastings#

We shall generate a sample Yi+1 from Xi. i.e. Yi+1=Xi+Δi.

Then we set Xi+1=Yi+1 with probability min(p(Yi+1)/p(Xi),1) and set Xi+1=Xi otherwise.

You can check that the stationary probability is p.

Note that we don’t need the value of p, we only need the ratio, so p need not be normalized.

Let’s use this to compute $x2Z1eV(x)dx,whereV(x)=x^2andZ=\int e^{-V(x)}.Thecorrespondingp(x) = Z^{-1} e^{-V(x)}$.

Note that the samples are no longer independent. The covariance between samples will lower the effective number of samples, but the estimator will still converge.

def MC_step(x,p):
    delta = 0.1
    y = x + delta*np.random.randn()
    prob = min(p(y)/p(x),1)
    if np.random.rand() < prob:
        x = y
    return x

V = lambda x: x**4-x**2
p = lambda x: np.exp(-V(x))

nburn_in = 1000
x = 0
for i in range(nburn_in):
    x = MC_step(x,p)
    
nsample = 10000
xs = [x]
for i in range(nsample):
    x = MC_step(x,p)
    xs.append(x)

xs = np.array(xs)
sum(xs*xs)/nsample
plt.hist(xs,bins=50)
plt.show()
../_images/d9dad06f2d2eb17614868e3a6e499ea402d8763986d199f0f6cfe9f0d2b6ed64.png
xx = np.linspace(-1.5,1.5,200)
plt.plot(xx,p(xx),'.')
kde = scipy.stats.gaussian_kde(xs)
plt.plot(xx,kde(xx))
[<matplotlib.lines.Line2D at 0x7f338761cf50>]
../_images/e1a47ec9fdac77987cfb0415708645148af48fc78fc807bb5e0d13476b2bdbb8.png
Such examples occur in statistical physics.
  Cell In[8], line 1
    Such examples occur in statistical physics.
         ^
SyntaxError: invalid syntax