renom.optimizer ¶
-
class
renom.optimizer.
Sgd
( lr=0.1 , momentum=0.4 , nesterov=True ) ¶ -
Bases:
renom.optimizer.Optimizer
Stochastic Gradient Descent.
Parameters: Example
>>> import numpy as np >>> import renom as rm >>> x = rm.Variable(np.random.rand(2, 3)) >>> x Variable([[ 0.93283856, 0.44494787, 0.47652033], [ 0.04769089, 0.16719061, 0.52063918]], dtype=float32) >>> a = 2 >>> opt = rm.Sgd(lr=0.1) # Stochastic gradient decent algorithm >>> y = rm.sum(a*x) >>> dx = y.grad(detach_graph=False).get(x) >>> dx RMul([[ 2., 2., 2.], [ 2., 2., 2.]], dtype=float32) >>> y.grad(detach_graph=False).update(opt) >>> x Variable([[ 0.73283857, 0.24494787, 0.27652031], [-0.1523091 , -0.03280939, 0.32063919]], dtype=float32)
-
class
renom.optimizer.
ClampedSgd
( lr=0.1 , momentum=0.4 , minimum=-10000.0 , maximum=10000.0 ) ¶ -
Bases:
renom.optimizer.Sgd
-
class
renom.optimizer.
Adagrad
( lr=0.01 , epsilon=1e-08 ) ¶ -
Bases:
renom.optimizer.Optimizer
Adaptive gradient algorithm. [Adagrad]
Parameters: [Adagrad] Duchi, J., Hazan, E., & Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121–2159.
-
class
renom.optimizer.
Adadelta
( dr=0.95 , epsilon=1e-08 ) ¶ -
Bases:
renom.optimizer.Optimizer
Adaptive gradient algorithm. [Adagrad]
Parameters: [Adagrad] Duchi, J., Hazan, E., & Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 12, 2121–2159.
-
class
renom.optimizer.
Rmsprop
( lr=0.001 , g=0.9 , epsilon=1e-08 , running_average=1 ) ¶ -
Bases:
renom.optimizer.Optimizer
Rmsprop described by following formula. [Rmsprop]
\begin{split}m_{t+1} &=& gm_{t} + (1-g)\nabla E^2 \\ r_{t} &=& \frac{lr}{\sqrt{m_{t+1}}+\epsilon} \\ w_{t+1} &=& w_{t} - r_{t}\nabla E\end{split}Parameters: [Rmsprop] Nitish Srivastava, Kevin Swersky, Geoffrey Hinton. Neural Networks for Machine Learning.
-
class
renom.optimizer.
Adam
( lr=0.001 , g=0.999 , b=0.9 , epsilon=1e-08 ) ¶ -
Bases:
renom.optimizer.Optimizer
Adaptive moment estimation described by following formula. [Adam]
\begin{split}m_{t+1} &=& bm_t + \nabla E \\ n_{t+1} &=& gn_t + \nabla E^2 \\ \hat{m}_{t+1} &=& \frac{m_{t+1}}{1-b^{t+1}} \\ \hat{n}_{t+1} &=& \frac{n_{t+1}}{1-g^{t+1}} \\ w_{t+1} &=& w_{t} - \frac{\alpha \hat{m}_{t+1}}{\sqrt{\hat{n}_{t+1}}+\epsilon}\end{split}Parameters: [Adam] Diederik P. Kingma, Jimmy Ba. ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION(2014) https://arxiv.org/pdf/1412.6980.pdf