ADAM
class ADAM(maxiter=10000, tol=1e-06, lr=0.001, beta_1=0.9, beta_2=0.99, noise_factor=1e-08, eps=1e-10, amsgrad=False, snapshot_dir=None)
Adam and AMSGRAD optimizer.
Adam
Kingma, Diederik & Ba, Jimmy. (2014).
Adam: A Method for Stochastic Optimization. International Conference on Learning Representations.
Adam is a gradient-based optimization algorithm that is relies on adaptive estimates of lower-order moments. The algorithm requires little memory and is invariant to diagonal rescaling of the gradients. Furthermore, it is able to cope with non-stationary objective functions and noisy and/or sparse gradients.
AMSGRAD
Sashank J. Reddi and Satyen Kale and Sanjiv Kumar. (2018).
On the Convergence of Adam and Beyond. International Conference on Learning Representations.
AMSGRAD (a variant of ADAM) uses a ‘long-term memory’ of past gradients and, thereby, improves convergence properties.
Parameters
- maxiter (
int
) – Maximum number of iterations - tol (
float
) – Tolerance for termination - lr (
float
) – Value >= 0, Learning rate. - beta_1 (
float
) – Value in range 0 to 1, Generally close to 1. - beta_2 (
float
) – Value in range 0 to 1, Generally close to 1. - noise_factor (
float
) – Value >= 0, Noise factor - eps (
float
) – Value >=0, Epsilon to be used for finite differences if no analytic gradient method is given. - amsgrad (
bool
) – True to use AMSGRAD, False if not - snapshot_dir (
Optional
[str
]) – If not None save the optimizer’s parameter after every step to the given directory
Attributes
bounds_support_level
Returns bounds support level
gradient_support_level
Returns gradient support level
initial_point_support_level
Returns initial point support level
is_bounds_ignored
Returns is bounds ignored
is_bounds_required
Returns is bounds required
is_bounds_supported
Returns is bounds supported
is_gradient_ignored
Returns is gradient ignored
is_gradient_required
Returns is gradient required
is_gradient_supported
Returns is gradient supported
is_initial_point_ignored
Returns is initial point ignored
is_initial_point_required
Returns is initial point required
is_initial_point_supported
Returns is initial point supported
setting
Return setting
Methods
get_support_level
ADAM.get_support_level()
Return support level dictionary
gradient_num_diff
static ADAM.gradient_num_diff(x_center, f, epsilon, max_evals_grouped=1)
We compute the gradient with the numeric differentiation in the parallel way, around the point x_center.
Parameters
- x_center (ndarray) – point around which we compute the gradient
- f (func) – the function of which the gradient is to be computed.
- epsilon (float) – the epsilon used in the numeric differentiation.
- max_evals_grouped (int) – max evals grouped
Returns
the gradient computed
Return type
grad
load_params
ADAM.load_params(load_dir)
load params
minimize
ADAM.minimize(objective_function, initial_point, gradient_function)
optimize
ADAM.optimize(num_vars, objective_function, gradient_function=None, variable_bounds=None, initial_point=None)
Perform optimization.
Parameters
- num_vars (int) – number of parameters to be optimized.
- objective_function (callable) – handle to a function that computes the objective function.
- gradient_function (callable) – handle to a function that computes the gradient of the objective function, or None if not available.
- variable_bounds (list[(float, float)]) – deprecated
- initial_point (numpy.ndarray[float]) – initial point.
Returns
tuple has (point, value, nfev) where
point: is a 1D numpy.ndarray[float] containing the solution
value: is a float with the objective function value
nfev: number of objective function calls made if available or None
Return type
tuple(numpy.ndarray, float, int)
print_options
ADAM.print_options()
Print algorithm-specific options.
save_params
ADAM.save_params(snapshot_dir)
save params
set_max_evals_grouped
ADAM.set_max_evals_grouped(limit)
Set max evals grouped
set_options
ADAM.set_options(**kwargs)
Sets or updates values in the options dictionary.
The options dictionary may be used internally by a given optimizer to pass additional optional values for the underlying optimizer/optimization function used. The options dictionary may be initially populated with a set of key/values when the given optimizer is constructed.
Parameters
kwargs (dict) – options, given as name=value.
wrap_function
static ADAM.wrap_function(function, args)
Wrap the function to implicitly inject the args at the call of the function.
Parameters
- function (func) – the target function
- args (tuple) – the args to be injected
Returns
wrapper
Return type
function_wrapper