ADAM

class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)

GitHub

Adam and AMSGRAD optimizer.

Adam

Kingma, Diederik & Ba, Jimmy. (2014).

Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.

Adam is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.

AMSGRAD

Sashank J. Reddi and Satyen Kale and Sanjiv Kumar. (2018).

On the Convergence of Adam and Beyond. International Conference on Learning Representations.

AMSGRAD (a variant of ADAM) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.

Parameters

maxiter (int) – Maximum number of iterations
tol (float) – Tolerance for termination
lr (float) – Value >= 0, Learning rate.
beta_1 (float) – Value in range 0 to 1, Generally close to 1.
beta_2 (float) – Value in range 0 to 1, Generally close to 1.
noise_factor (float) – Value >= 0, Noise factor
eps (float) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given.
amsgrad (bool) – True to use AMSGRAD, False if not
snapshot_dir (Optional[str]) – If not None save the optimizer’s parameter after every step to the given directory

Attributes

bounds_support_level

Returns bounds support level

gradient_support_level

Returns gradient support level

initial_point_support_level

Returns initial point support level

is_bounds_ignored

Returns is bounds ignored

is_bounds_required

Returns is bounds required

is_bounds_supported

Returns is bounds supported

is_gradient_ignored

Returns is gradient ignored

is_gradient_required

Returns is gradient required

is_gradient_supported

Returns is gradient supported

is_initial_point_ignored

Returns is initial point ignored

is_initial_point_required

Returns is initial point required

is_initial_point_supported

Returns is initial point supported

setting

Return setting

Methods

get_support_level

ADAM.get_support_level()

Return support level dictionary

gradient_num_diff

static ADAM.gradient_num_diff(x_center, f, epsilon, max_evals_grouped=1)

We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.

Parameters

x_center (ndarray) – point around which we compute the gradient
f (func) – the function of which the gradient is to be computed.
epsilon (float) – the epsilon used in the numeric differentiation.
max_evals_grouped (int) – max evals grouped

Returns

the gradient computed

Return type

grad

load_params

ADAM.load_params(load_dir)

load params

minimize

ADAM.minimize(objective_function, initial_point, gradient_function)

optimize

ADAM.optimize(num_vars, objective_function, gradient_function=None, variable_bounds=None, initial_point=None)

Perform optimization.

Parameters

num_vars (int) – number of parameters to be optimized.
objective_function (callable) – handle to a function that computes the objective function.
gradient_function (callable) – handle to a function that computes the gradient of the objective function, or None if not available.
variable_bounds (list[(float, float)]) – deprecated
initial_point (numpy.ndarray[float]) – initial point.

Returns

tuple has (point, value, nfev) where

$point: is a 1D numpy.ndarray[float] containing the solution$ $value: is a float with the objective function value$ $nfev: number of objective function calls made if available or None$

Return type

tuple(numpy.ndarray, float, int)

print_options

ADAM.print_options()

Print algorithm-specific options.

save_params

ADAM.save_params(snapshot_dir)

save params

set_max_evals_grouped

ADAM.set_max_evals_grouped(limit)

Set max evals grouped

set_options

ADAM.set_options(**kwargs)

Sets or updates values in the options dictionary.

The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.

Parameters

kwargs (dict) - options, given as name=value.

wrap_function

static ADAM.wrap_function(function, args)

Wrap the function to implicitly inject the args at the call of the function.

Parameters

function (func) – the target function
args (tuple) – the args to be injected

Returns

wrapper

Return type

function_wrapper

Was this page helpful?

Report a bug or request content on GitHub.