renom_rl.discrete.a2c

class A2C ( env , network , loss_func=None , optimizer=None , gamma=0.99 , num_worker=8 , advantage=5 , value_coef=0.5 , entropy_coef=0.01 , node_selector=None , test_node_selector=None , gradient_clipping=None , logger=None )

Bases: renom_rl.AgentBase

A2C class
This class provides a reinforcement learning agent including training method. This class runs on a single thread.
Parameters:
  • env ( BaseEnv ) – Environment. This must be a child class of BaseEnv .
  • network ( Model ) – Actor Critic Model.
  • num_worker ( int ) – Number of actor/environment model.
  • advantage ( int ) – Advantage steps.
  • node_selector ( DiscreteNodeChooser ) – node selector.
  • test_node_selector ( DiscreteNodeChooser ) – test node selector.
  • loss_func ( function ) – Loss function for train q-network. Default is MeanSquaredError() .
  • optimizer – Optimizer for train q-network. Default is Rmsprop(lr=0.00025, g=0.95) .
  • entropy_coef ( float ) – Coefficient of actor’s output entropy.
  • value_coef ( float ) – Coefficient of value loss.
  • gamma ( float ) – Discount rate.
  • buffer_size ( float, int ) – The size of replay buffer.

Example

>>> import numpy as np
>>> import renom as rm
>>> from renom_rl.discrete.a2c import A2C
>>> from renom_rl.environ.openai import CartPole00
>>>
>>> class ActorCritic(rm.Model):
...     def __init__(self):
...         self.l1=rm.Dense(32)
...         self.l2=rm.Dense(32)
...         self.l3=rm.Dense(2)
...         self.l4=rm.Dense(1)
...
...     def forward(self,x):
...         h1 = self.l1(x)
...         h2 = self.l2(h1)
...         act = rm.softmax(self.l3(h2))
...         val=self.l4(h2)
...         return act,val
...
>>> model=ActorCritic()
>>> env = CartPole00()
>>> a2c=A2C(env,model)
>>> a2c.fit(epoch=1,epoch_step=10000)

References

A. V. Clemente, H. N. Castejon, and A. Chandra.
Efficient Parallel Methods for Deep Reinforcement Learning
fit ( epoch=1 , epoch_step=250000 , test_step=None )

This method executes training of actor critic. Test will be runned after each epoch is done.

Parameters:
  • epoch ( int ) – Number of epoch for training.
  • epoch_step ( int ) – Number of step of one epoch.
  • test_step ( int ) – Number steps during test.
test ( test_step=None , **kwargs )

Test the trained actor agent.

Parameters: test_step ( int, None ) – Number of steps (not episodes) for test. If None is given, this method tests execute only 1 episode.
Returns: Sum of rewards.