# Optimization¶

## Introduction¶

Optimization provides an alternative approach to marginal inference.

In this section we refer to the program for which we would like to obtain the marginal distribution as the target program.

If we take a target program and add a guide distribution to each random choice, then we can define the guide program as the program you get when you sample from the guide distribution at each `sample` statement and ignore all `factor` statements.

If we endow this guide program with adjustable parameters, then we can optimize those parameters so as to minimize the distance between the joint distribution of the choices in the guide program and those in the target.

This general approach includes a number of well-known algorithms as special cases.

It is supported in WebPPL by a method for performing optimization, primitives for specifying parameters, and the ability to specify guides.

## Optimize¶

`Optimize`(options)
Arguments: options (object) – Optimization options. Nothing.

Optimizes the parameters of the guide program specified by the `model` option.

The following options are supported:

`model`

A function of zero arguments that specifies the target and guide programs.

This option must be present.

`steps`

The number of optimization steps to take.

Default: `1`

`optMethod`

The optimization method used. The following methods are available:

• `'sgd'`
• `'adagrad'`
• `'rmsprop'`
• `'adam'`

Each method takes a `stepSize` sub-option, see below for example usage. Additional method specific options are available, see the adnn optimization module for details.

Default: `'adam'`

`estimator`

Specifies the optimization objective and the method used to estimate its gradients. See Estimators.

Default: `ELBO`

`verbose`

Default: `true`

Example usage:

```Optimize({model: model, steps: 100});
Optimize({model: model, optMethod: {sgd: {stepSize: 0.5}}});
```

### Estimators¶

The following estimators are available:

`ELBO`

This is the evidence lower bound (ELBO). Optimizing this objective yields variational inference.

For best performance use `mapData()` in place of `map()` where possible when optimizing this objective. The conditional independence information this provides is used to reduce the variance of gradient estimates which can significantly improve performance, particularly in the presence of discrete random choices. Data sub-sampling is also supported through the use of `mapData()`.

The following options are supported:

`samples`

The number of samples to take for each gradient estimate.

Default: `1`

`avgBaselines`

Enable the “average baseline removal” variance reduction strategy.

Default: `true`

`avgBaselineDecay`

The decay rate used in the exponential moving average used to estimate baselines.

Default: `0.9`

Example usage:

```Optimize({model: model, estimator: 'ELBO'});
Optimize({model: model, estimator: {ELBO: {samples: 10}}});
```

## Parameters¶

`param`([options])

Retrieves the value of a parameter by name. If the parameter does not exist, it is created and initialized with a draw from a Gaussian distribution.

The following options are supported:

`dims`

When `dims` is given, `param` returns a tensor of dimension `dims`. In this case `dims` should be an array.

When `dims` is omitted, `param` returns a scalar.

`mu`

The mean of the Gaussian distribution from which the initial parameter value is drawn.

Default: `0`

`sigma`

The standard deviation of the Gaussian distribution from which the initial parameter value is drawn. Specify a standard deviation of `0` to deterministically initialize the parameter to `mu`.

Default: `0.1`

`name`

The name of the parameter to retrieve. If `name` is omitted a default name is automatically generated based on the current stack address, relative to the current coroutine.

Examples:

```param()
param({name: 'myparam'})
param({mu: 0, sigma: 0.01, name: 'myparam'})
param({dims: [10, 10]})
```
`modelParam`([options])

An analog of `param` used to create or retrieve a parameter that can be used directly in the model.

Optimizing the ELBO yields maximum likelihood estimation for model parameters. `modelParam` cannot be used with other inference strategies as it does not have an interpretation in the fully Bayesian setting. Attempting to do so will raise an exception.

`modelParam` supports the same options as `param`. See the documentation for param for details.