Notes on SUMO

sumo

(2020/02/11)

In this note, I'll implement the Stochastically Unbiased Marginalization Objective (SUMO) to estimate the log-partition function of an energy funtion.

$p(z|x)=\frac{1}{Z}p(x|z)p(z)$ $\log Z = \log \int p(x|z)p(z)dz = \log p(x)$ .

$U(x)$ $p(x)=\frac{e^{-U(x)}}{\int e^{-U(x)} dx}$ . The common practice is to look at a variational form of the log-partition function,

\log Z = \log \int e^{-U(x)}dx = \max_{q(x)}\mathbb{E}[-U(x)-\log q(x)] \nonumber

$q$ would normally yield a strict lower bound, which means

\frac{1}{n}\sum_{i=1}^n \left(-U(x_i) - \log q(x_i)\right) \nonumber

$x_i$ i.i.d. $q$ $\log Z$ . In particular, it would be an underestimation.

$U$ as follows:

U(x_1,x_2)= - \log \left(\frac{1}{2}\cdot e^{-\frac{(x_1+2)^2 + x_2^2}{2}} + \frac{1}{2}\cdot\frac{1}{4}e^{-\frac{(x_1-2)^2 + x_2^2}{8}}\right) \nonumber

$U$ $\frac{1}{2}\mathcal{N}([-2,0], I) + \frac{1}{2}\mathcal{N}([2,0], 4I)$ $Z=2\pi\approx6.28$ $\log Z\approx1.8379$ .


def U(x):
  x1 = x[:,0]
  x2 = x[:,1]
  d2 = x2 ** 2
  return - np.log(np.exp(-((x1+2) ** 2 + d2)/2)/2 + np.exp(-((x1-2) ** 2 + d2)/8)/4/2)

$p(x)\propto e^{-U(x)}$


xxxxxxxxxx
xx = np.linspace(-5,5,200)
yy = np.linspace(-5,5,200)
X = np.meshgrid(xx,yy)
X = np.concatenate([X[0][:,:,None], X[1][:,:,None]], 2).reshape(-1,2)
unnormalized_density = np.exp(-U(X)).reshape(200,200)
plt.imshow(unnormalized_density)
plt.axis('off')

energy

As a sanity check, lets also visualize the density of the mixture of Gaussians.


xxxxxxxxxx
N1, N2 = mvn([-2,0], 1), mvn([2,0], 4)
density = (np.exp(N1.logpdf(X))/2 + np.exp(N2.logpdf(X))/2).reshape(200,200)
plt.imshow(density)
plt.axis('off')
print(np.allclose(unnormalized_density / density - 2*np.pi, 0))

 True

energy

Now if we estimate the log-partition function by estimating the variational lower bound, we get


x
q = mvn([0,0],5)
xs = q.rvs(10000*5)
elbo = - U(xs) - q.logpdf(xs)
plt.hist(elbo, range(-5,10))
print("Estimate:  %.4f  / Ground true:  %.4f" % (elbo.mean(), np.log(2*np.pi)))
print("Empirical variance: %.4f" % elbo.var())

Estimate:  1.4595  / Ground true:  1.8379
Empirical variance: 0.9921

energy

The lower bound can be tightened via [importance sampling):

\log \int e^{-U(x)} dx \geq \mathbb{E}_{q^K}\left[\log\left(\frac{1}{K}\sum_{j=1}^K \frac{e^{-U(x_j)}}{q(x_j)}\right)\right] \nonumber

$K$ concentration of the average $\log$ function: when the random variable is more deterministic, using a local linear approximation near its mean is more accurate as there's less "mass" outside of some neighborhood of the mean.

$K=5$


xxxxxxxxxx
k = 5
xs = q.rvs(10000*k)
elbo = - U(xs) - q.logpdf(xs)
iwlb = elbo.reshape(10000,k)
iwlb = np.log(np.exp(iwlb).mean(1))
plt.hist(iwlb, range(-5,10))
print("Estimate:  %.4f  / Ground true:  %.4f" % (iwlb.mean(), np.log(2*np.pi)))
print("Empirical variance: %.4f" % iwlb.var())

Estimate:  1.7616  / Ground true:  1.8379
Empirical variance: 0.1544

energy

We see that both the bias and variance decrease.

Finally, we use the Stochastically Unbiased Marginalization ObjectiveRussian Roulette $\text{IWAE}_K = \log\left(\frac{1}{K}\sum_{j=1}^K \frac{e^{-U(x_j)}}{q(x_j)}\right)$ $\Delta_K = \text{IWAE}_{K+1} - \text{IWAE}_K$ be the difference (which can be thought of as some form of correction). The SUMO estimator is defined as

\text{SUMO} = \text{IWAE}_1 + \sum_{k=1}^K \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \nonumber

$K\sim p(K)=\mathbb{P}(\mathcal{K}=K)$ . To see why this is an unbiased estimator,

\begin{align*} \mathbb{E}[\text{SUMO}] &= \mathbb{E}\left[\text{IWAE}_1 + \sum_{k=1}^K \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \right] \nonumber\\ &= \mathbb{E}_{x's}\left[\text{IWAE}_1 + \mathbb{E}_{K}\left[\sum_{k=1}^K \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \right]\right] \nonumber \end{align*}

The inner expectation can be further expanded

\begin{align*} \mathbb{E}_{K}\left[\sum_{k=1}^K \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \right] &= \sum_{K=1}^\infty P(K)\sum_{k=1}^K \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \\ &= \sum_{k=1}^\infty \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \sum_{K=k}^\infty P(K) \\ &= \sum_{k=1}^\infty \frac{\Delta_K}{\mathbb{P}(\mathcal{K}\geq k)} \mathbb{P}(\mathcal{K}\geq k) \\ &= \sum_{k=1}^\infty\Delta_K \\ &= \text{IWAE}_{2} - \text{IWAE}_1 + \text{IWAE}_{3} - \text{IWAE}_2 + ... = \lim_{k\rightarrow\infty}\text{IWAE}_{k}-\text{IWAE}_1 \end{align*}

$\mathbb{E}[\text{SUMO}] = \mathbb{E}[\text{IWAE}_\infty] = \log Z$ .

(N.B.) Some care needs to be taken care of for taking the limit. See the paper for more formal derivation.

$P(K)$ $\mathbb{P}(\mathcal{K}\geq K)=\frac{1}{K}$ $K$ inverse CDF $K=\lfloor\frac{u}{1-u}\rfloor$ $u$ $[0,1]$ .

Now putting things all together, we can estimate the log-partition using SUMO.


xxxxxxxxxx
count = 0
bs = 10
iwlb = list()
while count <= 1000000:
  u = np.random.rand(1)
  k = np.ceil(u/(1-u)).astype(int)[0]
  xs = q.rvs(bs*(k+1))
  elbo = - U(xs) - q.logpdf(xs)
  iwlb_ = elbo.reshape(bs, k+1)
  iwlb_ = np.log(np.cumsum(np.exp(iwlb_), 1) / np.arange(1,k+2))
  iwlb_ = iwlb_[:,0] + ((iwlb_[:,1:k+1] - iwlb_[:,0:k]) * np.arange(1,k+1)).sum(1)
  count += bs * (k+1)
  iwlb.append(iwlb_)
iwlb = np.concatenate(iwlb)
plt.hist(iwlb, range(-5,10))
print("Estimate:  %.4f  / Ground true:  %.4f" % (iwlb.mean(), np.log(2*np.pi)))
print("Empirical variance: %.4f" % iwlb.var())

Estimate:  1.8359  / Ground true:  1.8379
Empirical variance: 4.1794

energy

$q$ $q$ $\text{SUMO}$ estimator, which might be an interesting trick to look at next.

Notes on SUMO Unbiased Estimation of Log-Partition Function