# renom_rl.environ.env ¶

class  BaseEnv  ( action_shape=None , state_shape=None )

Base class of environment. The methods  step  and  reset  must be overridden. (  sample  is also necessary depending on algorithms.)

Users can override  terminate  ~  test_terminate  if necessary.

If  test_start  ,  test_epoch_step  ,  test_close  are not defined, then they will return the same value of  start  ,  epoch_step  ,  close  .

Note that these functions are only used to execute, thus arguments can not be implemented (excluding  step  ).

Example

>>> import numpy as np
>>> from renom_rl import BaseEnv
>>> class CustomEnv(BaseEnv):
...    def __init__(self):
...         action_shape = (5, )
...         state_shape = (86, 86)
...
...     def step(self, action):
...         state, reward, terminal = func(action)
...         return state, reward, terminal
...
...     def sample(self):
...         return self.step(np.random.randInt(0, 5))[0]
...
...     def reset(self):
...         initial_state=func.reset()
...         return initial_state
...

 step  ( action )

This method must be overridden. This method represents how the environment responds when an action is taken in that environment. This method must accept a single action and return next state, reward and terminal.

 Parameters: action ( ndarray , int , float ) – action value. state (ndarray) - Environment’s next state. Shape must be same as  BaseEnv.state_shape  . reward (  float  ) - Reward from transition. terminal (  bool  ) - Terminal flag.
 sample  ( )

This method must be overridden for algorithm that uses noise sample.

examples: DQN, DDQN

This method must return random action.

 Returns: Sampled action. Shape must be same as  BaseEnv.action_shape  .(int, ndarray)
 reset  ( )

This method must be overridden.

 Returns: Initial state. Shape must be same as  BaseEnv.state_shape  .(ndarray)
 terminate  ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process once a certain condition is met. Return True to terminate fit process. The return value is False by default.

 Returns: Terminates fit process if True.(bool)
 terminate_epoch  ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process for a single epoch once a certain condition is met. Return True to terminate epoch. The return value is False by default.

 Returns: Terminates epoch if True.(bool)
 stop_epoch  ( )

This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to stop the learning process for a single epoch and start testing process once a certain condition is met. Return True to stop epoch step. The return value is False by default.

 Returns: stops epoch step and starts testing if True.(bool)
 start  ( )

This is optional. This method will be called when fit function starts. This will execute after  reset  .

 epoch  ( )

This is optional. This method will be called when epoch is updated. This will execute after  reset  .

 epoch_step  ( )

This is optional. This method will be called every step. This will execute after  step  .

 close  ( )

This is optional. This method will be called when fit is closed.

 test_start  ( )

This is optional. This method will be called when test is starting. This will execute after  reset  .

 test_epoch_step  ( )

This is optional. This method will be called every step. This will execute after  step  .

 test_close  ( )

This is optional. This method will be called when test is done.

 test_terminate  ( )

This is optional. In some cases, users want to terminate testing for certain conditions. By overriding this function, users will able to terminate the test process once a certain condition is met. The return value is False by default.