Adaptive Primal-Dual#

This tutorial compares the traditional Chambolle-Pock Primal-dual algorithm with the Adaptive Primal-Dual Hybrid Gradient of Goldstein and co-authors.

By adaptively changing the step size in the primal and the dual directions, this algorithm shows faster convergence, which is of great importance for some of the problems that the Primal-Dual algorithm can solve - especially those with an expensive proximal operator.

For this example, we consider a simple denoising problem.

import numpy as np
import matplotlib.pyplot as plt
import pylops
from import camera

import pyproximal


def callback(x, f, g, K, cost, xtrue, err):
    cost.append(f(x) + g(K.matvec(x)))
    err.append(np.linalg.norm(x - xtrue))

Let’s start by loading a sample image and adding some noise

# Load image
img = camera()
ny, nx = img.shape

# Add noise
sigman = 20
n = np.random.normal(0, sigman, img.shape)
noise_img = img + n

We can now define a pylops.Gradient operator as well as the different proximal operators to be passed to our solvers

# Gradient operator
sampling = 1.
Gop = pylops.Gradient(dims=(ny, nx), sampling=sampling, edge=False,
                      kind='forward', dtype='float64')
L = 8. / sampling ** 2 # maxeig(Gop^H Gop)

# L2 data term
lamda = .04
l2 = pyproximal.L2(b=noise_img.ravel(), sigma=lamda)

# L1 regularization (isotropic TV)
l1iso = pyproximal.L21(ndim=2)

To start, we solve our denoising problem with the original Primal-Dual algorithm

# Primal-dual
tau = 0.95 / np.sqrt(L)
mu = 0.95 / np.sqrt(L)

cost_fixed = []
err_fixed = []
iml12_fixed = \
    pyproximal.optimization.primaldual.PrimalDual(l2, l1iso, Gop,
                                                  tau=tau, mu=mu, theta=1.,
                                                  gfirst=False, niter=300, show=True,
                                                  callback=lambda x: callback(x, l2, l1iso,
                                                                              Gop, cost_fixed,
iml12_fixed = iml12_fixed.reshape(img.shape)
Primal-dual: min_x f(Ax) + x^T z + g(x)
Proximal operator (f): <class 'pyproximal.proximal.L2.L2'>
Proximal operator (g): <class 'pyproximal.proximal.L21.L21'>
Linear operator (A): <class 'pylops.basicoperators.gradient.Gradient'>
Additional vector (z): None
tau = 0.33587572106361          mu = 0.33587572106361
theta = 1.00            niter = 300

   Itn       x[0]          f           g          z^x       J = f + g + z^x
     1   2.74009e+00   1.148e+08   1.332e+05   0.000e+00       1.149e+08
     2   5.33483e+00   1.118e+08   1.387e+05   0.000e+00       1.120e+08
     3   7.79620e+00   1.090e+08   1.220e+05   0.000e+00       1.091e+08
     4   1.01855e+01   1.062e+08   1.118e+05   0.000e+00       1.063e+08
     5   1.25593e+01   1.035e+08   1.111e+05   0.000e+00       1.036e+08
     6   1.49436e+01   1.008e+08   1.146e+05   0.000e+00       1.009e+08
     7   1.73372e+01   9.823e+07   1.191e+05   0.000e+00       9.835e+07
     8   1.97256e+01   9.571e+07   1.245e+05   0.000e+00       9.584e+07
     9   2.20949e+01   9.326e+07   1.309e+05   0.000e+00       9.339e+07
    10   2.44390e+01   9.088e+07   1.380e+05   0.000e+00       9.102e+07
    31   6.76474e+01   5.301e+07   2.890e+05   0.000e+00       5.330e+07
    61   1.11499e+02   2.520e+07   4.555e+05   0.000e+00       2.566e+07
    91   1.40329e+02   1.268e+07   5.691e+05   0.000e+00       1.325e+07
   121   1.60019e+02   7.031e+06   6.454e+05   0.000e+00       7.677e+06
   151   1.73258e+02   4.478e+06   6.965e+05   0.000e+00       5.175e+06
   181   1.82082e+02   3.321e+06   7.308e+05   0.000e+00       4.051e+06
   211   1.87945e+02   2.793e+06   7.538e+05   0.000e+00       3.547e+06
   241   1.91893e+02   2.551e+06   7.692e+05   0.000e+00       3.321e+06
   271   1.94521e+02   2.439e+06   7.796e+05   0.000e+00       3.219e+06
   292   1.95806e+02   2.399e+06   7.847e+05   0.000e+00       3.183e+06
   293   1.95859e+02   2.397e+06   7.849e+05   0.000e+00       3.182e+06
   294   1.95912e+02   2.396e+06   7.851e+05   0.000e+00       3.181e+06
   295   1.95963e+02   2.394e+06   7.853e+05   0.000e+00       3.180e+06
   296   1.96015e+02   2.393e+06   7.855e+05   0.000e+00       3.178e+06
   297   1.96065e+02   2.392e+06   7.857e+05   0.000e+00       3.177e+06
   298   1.96115e+02   2.390e+06   7.859e+05   0.000e+00       3.176e+06
   299   1.96165e+02   2.389e+06   7.861e+05   0.000e+00       3.175e+06
   300   1.96214e+02   2.388e+06   7.863e+05   0.000e+00       3.174e+06

Total time (s) = 13.79

We do the same with the adaptive algorithm

cost_ada = []
err_ada = []
iml12_ada, steps = \
    pyproximal.optimization.primaldual.AdaptivePrimalDual(l2, l1iso, Gop,
                                                          tau=tau, mu=mu,
                                                          niter=45, show=True, tol=0.05,
                                                          callback=lambda x: callback(x, l2, l1iso,
                                                                                      Gop, cost_ada,
iml12_ada = iml12_ada.reshape(img.shape)
Adaptive Primal-dual: min_x f(Ax) + x^T z + g(x)
Proximal operator (f): <class 'pyproximal.proximal.L2.L2'>
Proximal operator (g): <class 'pyproximal.proximal.L21.L21'>
Linear operator (A): <class 'pylops.basicoperators.gradient.Gradient'>
Additional vector (z): None
tau0 = 3.358757e-01     mu0 = 3.358757e-01
alpha0 = 5.000000e-01   eta = 9.500000e-01
s = 1.000000e+00        delta = 1.500000e+00
niter = 45              tol = 5.000000e-02

   Itn       x[0]          f           g          z^x       J = f + g + z^x
     2   2.74009e+00   1.148e+08   1.332e+05   0.000e+00       1.149e+08
     3   7.86167e+00   1.090e+08   1.629e+05   0.000e+00       1.091e+08
     4   1.68808e+01   9.887e+07   2.034e+05   0.000e+00       9.908e+07
     5   3.18816e+01   8.315e+07   2.865e+05   0.000e+00       8.344e+07
     6   5.46224e+01   6.212e+07   4.085e+05   0.000e+00       6.253e+07
     7   8.48974e+01   3.915e+07   5.577e+05   0.000e+00       3.971e+07
     8   1.08716e+02   2.502e+07   6.667e+05   0.000e+00       2.568e+07
     9   1.27445e+02   1.631e+07   7.422e+05   0.000e+00       1.706e+07
    10   1.42173e+02   1.095e+07   7.936e+05   0.000e+00       1.175e+07
    13   1.67755e+02   4.739e+06   8.598e+05   0.000e+00       5.599e+06
    17   1.74749e+02   3.767e+06   8.488e+05   0.000e+00       4.616e+06
    21   1.79702e+02   3.316e+06   8.361e+05   0.000e+00       4.152e+06
    25   1.85302e+02   2.940e+06   8.460e+05   0.000e+00       3.786e+06
    29   1.89319e+02   2.705e+06   8.365e+05   0.000e+00       3.542e+06
    33   1.91736e+02   2.566e+06   8.189e+05   0.000e+00       3.384e+06
    37   1.93424e+02   2.480e+06   8.083e+05   0.000e+00       3.288e+06
    38   1.93811e+02   2.464e+06   8.067e+05   0.000e+00       3.271e+06
    39   1.94192e+02   2.450e+06   8.054e+05   0.000e+00       3.255e+06
    40   1.94567e+02   2.437e+06   8.045e+05   0.000e+00       3.242e+06
    41   1.94933e+02   2.426e+06   8.037e+05   0.000e+00       3.230e+06
    42   1.95286e+02   2.417e+06   8.032e+05   0.000e+00       3.220e+06
    43   1.95622e+02   2.408e+06   8.028e+05   0.000e+00       3.211e+06
    44   1.95937e+02   2.400e+06   8.024e+05   0.000e+00       3.203e+06
    45   1.96227e+02   2.394e+06   8.022e+05   0.000e+00       3.196e+06
    46   1.96492e+02   2.388e+06   8.019e+05   0.000e+00       3.190e+06

Total time (s) = 2.37

Let’s now compare the final results as well as the convergence curves of the two algorithms. We can see how the adaptive Primal-Dual produces a better estimate of the clean image in a much smaller number of iterations

fig, axs = plt.subplots(1, 4, figsize=(16, 4))
axs[0].imshow(img, cmap='gray', vmin=0, vmax=255)
axs[1].imshow(noise_img, cmap='gray', vmin=0, vmax=255)
axs[2].imshow(iml12_fixed, cmap='gray', vmin=0, vmax=255)
axs[3].imshow(iml12_ada, cmap='gray', vmin=0, vmax=255)
axs[3].set_title('Adaptive PD')

fig, axs = plt.subplots(2, 1, figsize=(12, 7))
axs[0].plot(cost_fixed, 'k', label='Fixed step')
axs[0].plot(cost_ada, 'r', label='Adaptive step')
axs[1].plot(err_fixed, 'k', label='Fixed step')
axs[1].plot(err_ada, 'r', label='Adaptive step')

fig, axs = plt.subplots(3, 1, figsize=(12, 7))
axs[0].plot(steps[0], 'k')
axs[1].plot(steps[1], 'k')
axs[2].plot(steps[2], 'k')
  • Original, Noisy, PD, Adaptive PD
  • Functional, MSE
  • $\tau^k$, $\mu^k$, $\alpha^k$

Total running time of the script: (0 minutes 17.333 seconds)

Gallery generated by Sphinx-Gallery