renom.optimizer

class renom.optimizer. Sgd ( lr=0.1 , momentum=0.4 , nesterov=True )

Bases: renom.optimizer.Optimizer

Stochastic Gradient Descent.

Parameters:
  • lr ( float ) – Learning rate.
  • momentum ( float ) – Momentum coefficient of optimization.
  • nesterov ( bool ) – If true, applies nesterov’s accelerated gradient.

Example

>>> import numpy as np
>>> import renom as rm
>>> x = rm.Variable(np.random.rand(2, 3))
>>> x
Variable([[ 0.93283856,  0.44494787,  0.47652033],
          [ 0.04769089,  0.16719061,  0.52063918]], dtype=float32)
>>> a = 2
>>> opt = rm.Sgd(lr=0.1)    # Stochastic gradient decent algorithm
>>> y = rm.sum(a*x)
>>> dx = y.grad(detach_graph=False).get(x)
>>> dx
RMul([[ 2.,  2.,  2.],
      [ 2.,  2.,  2.]], dtype=float32)
>>> y.grad(detach_graph=False).update(opt)
>>> x
Variable([[ 0.73283857,  0.24494787,  0.27652031],
          [-0.1523091 , -0.03280939,  0.32063919]], dtype=float32)
class renom.optimizer. ClampedSgd ( lr=0.1 , momentum=0.4 , minimum=-10000.0 , maximum=10000.0 )

Bases: renom.optimizer.Sgd

class renom.optimizer. Adagrad ( lr=0.01 , epsilon=1e-08 )

Bases: renom.optimizer.Optimizer

Adaptive gradient algorithm. [Adagrad]

Parameters:
  • lr ( float ) – Learning rate.
  • epsilon ( float ) – Small number in the equation for avoiding zero division.
[Adagrad] Duchi, J., Hazan, E., & Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121–2159.
class renom.optimizer. Adadelta ( dr=0.95 , epsilon=1e-08 )

Bases: renom.optimizer.Optimizer

Adaptive gradient algorithm. [Adagrad]

Parameters:
  • dr ( float ) – Decay rate.
  • epsilon ( float ) – Small number in the equation for avoiding zero division.
[Adagrad] Duchi, J., Hazan, E., & Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121–2159.
class renom.optimizer. Rmsprop ( lr=0.001 , g=0.9 , epsilon=1e-08 , running_average=1 )

Bases: renom.optimizer.Optimizer

Rmsprop described by following formula. [Rmsprop]

\begin{split}m_{t+1} &=& gm_{t} + (1-g)\nabla E^2 \\ r_{t} &=& \frac{lr}{\sqrt{m_{t+1}}+\epsilon} \\ w_{t+1} &=& w_{t} - r_{t}\nabla E\end{split}
Parameters:
  • lr ( float ) – Learning rate.
  • g ( float ) –
  • epsilon ( float ) – Small number in the equation for avoiding zero division.
[Rmsprop] Nitish Srivastava, Kevin Swersky, Geoffrey Hinton. Neural Networks for Machine Learning.
class renom.optimizer. Adam ( lr=0.001 , g=0.999 , b=0.9 , epsilon=1e-08 )

Bases: renom.optimizer.Optimizer

Adaptive moment estimation described by following formula. [Adam]

\begin{split}m_{t+1} &=& bm_t + \nabla E \\ n_{t+1} &=& gn_t + \nabla E^2 \\ \hat{m}_{t+1} &=& \frac{m_{t+1}}{1-b^{t+1}} \\ \hat{n}_{t+1} &=& \frac{n_{t+1}}{1-g^{t+1}} \\ w_{t+1} &=& w_{t} - \frac{\alpha \hat{m}_{t+1}}{\sqrt{\hat{n}_{t+1}}+\epsilon}\end{split}
Parameters:
  • lr ( float ) – Learning rate.
  • g ( float ) – Coefficient
  • b ( float ) – Coefficient
  • epsilon ( float ) – Small number in the equation for avoiding zero division.
[Adam] Diederik P. Kingma, Jimmy Ba. ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION(2014) https://arxiv.org/pdf/1412.6980.pdf