renom_rl.environ.env

class BaseEnv ( action_shape=None , state_shape=None )

Bases: object

Base class of environment. The methods step and reset must be overridden. ( sample is also necessary depending on algorithms.)

Users can override terminate ~ test_terminate if necessary.

If test_start , test_epoch_step , test_close are not defined, then they will return the same value of start , epoch_step , close .

Note that these functions are only used to execute, thus arguments can not be implemented (excluding step ).

Example

>>> import numpy as np
>>> from renom_rl import BaseEnv
>>> class CustomEnv(BaseEnv):
...    def __init__(self):
...         action_shape = (5, )
...         state_shape = (86, 86)
...
...     def step(self, action):
...         state, reward, terminal = func(action)
...         return state, reward, terminal
...
...     def sample(self):
...         return self.step(np.random.randInt(0, 5))[0]
...
...     def reset(self):
...         initial_state=func.reset()
...         return initial_state
...
step ( action )

This method must be overridden. This method represents how the environment responds when an action is taken in that environment. This method must accept a single action and return next state, reward and terminal.

Parameters: action ( ndarray , int , float ) – action value.
Returns:
  • state (ndarray) - Environment’s next state. Shape must be same as BaseEnv.state_shape .
  • reward ( float ) - Reward from transition.
  • terminal ( bool ) - Terminal flag.
sample ( )

This method must be overridden for algorithm that uses noise sample.

examples: DQN, DDQN

This method must return random action.

Returns: Sampled action. Shape must be same as BaseEnv.action_shape .(int, ndarray)
reset ( )

This method must be overridden.

Returns: Initial state. Shape must be same as BaseEnv.state_shape .(ndarray)
terminate ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process once a certain condition is met. Return True to terminate fit process. The return value is False by default.

Returns: Terminates fit process if True.(bool)
terminate_epoch ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process for a single epoch once a certain condition is met. Return True to terminate epoch. The return value is False by default.

Returns: Terminates epoch if True.(bool)
stop_epoch ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to stop the learning process for a single epoch and start testing process once a certain condition is met. Return True to stop epoch step. The return value is False by default.

Returns: stops epoch step and starts testing if True.(bool)
start ( )

This is optional. This method will be called when fit function starts. This will execute after reset .

epoch ( )

This is optional. This method will be called when epoch is updated. This will execute after reset .

epoch_step ( )

This is optional. This method will be called every step. This will execute after step .

close ( )

This is optional. This method will be called when fit is closed.

test_start ( )

This is optional. This method will be called when test is starting. This will execute after reset .

test_epoch_step ( )

This is optional. This method will be called every step. This will execute after step .

test_close ( )

This is optional. This method will be called when test is done.

test_terminate ( )

This is optional. In some cases, users want to terminate testing for certain conditions. By overriding this function, users will able to terminate the test process once a certain condition is met. The return value is False by default.