renom_rl.environ.env ¶
-
class
BaseEnv
( action_shape=None , state_shape=None ) ¶ -
Bases:
object
Base class of environment. The methods
step
andreset
must be overridden. (sample
is also necessary depending on algorithms.)Users can override
terminate
~test_terminate
if necessary.If
test_start
,test_epoch_step
,test_close
are not defined, then they will return the same value ofstart
,epoch_step
,close
.Note that these functions are only used to execute, thus arguments can not be implemented (excluding
step
).Example
>>> import numpy as np >>> from renom_rl import BaseEnv >>> class CustomEnv(BaseEnv): ... def __init__(self): ... action_shape = (5, ) ... state_shape = (86, 86) ... ... def step(self, action): ... state, reward, terminal = func(action) ... return state, reward, terminal ... ... def sample(self): ... return self.step(np.random.randInt(0, 5))[0] ... ... def reset(self): ... initial_state=func.reset() ... return initial_state ...
-
step
( action ) ¶ -
This method must be overridden. This method represents how the environment responds when an action is taken in that environment. This method must accept a single action and return next state, reward and terminal.
Parameters: action ( ndarray , int , float ) – action value. Returns:
-
sample
( ) ¶ -
This method must be overridden for algorithm that uses noise sample.
examples: DQN, DDQN
This method must return random action.
Returns: Sampled action. Shape must be same as BaseEnv.action_shape
.(int, ndarray)
-
reset
( ) ¶ -
This method must be overridden.
Returns: Initial state. Shape must be same as BaseEnv.state_shape
.(ndarray)
-
terminate
( ) ¶ -
This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process once a certain condition is met. Return True to terminate fit process. The return value is False by default.
Returns: Terminates fit process if True.(bool)
-
terminate_epoch
( ) ¶ -
This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to terminate the learning process for a single epoch once a certain condition is met. Return True to terminate epoch. The return value is False by default.
Returns: Terminates epoch if True.(bool)
-
stop_epoch
( ) ¶ -
This is optional. In some cases, users want to terminate learning for certain conditions. By overriding this function, users will able to stop the learning process for a single epoch and start testing process once a certain condition is met. Return True to stop epoch step. The return value is False by default.
Returns: stops epoch step and starts testing if True.(bool)
-
start
( ) ¶ -
This is optional. This method will be called when fit function starts. This will execute after
reset
.
-
epoch
( ) ¶ -
This is optional. This method will be called when epoch is updated. This will execute after
reset
.
-
epoch_step
( ) ¶ -
This is optional. This method will be called every step. This will execute after
step
.
-
close
( ) ¶ -
This is optional. This method will be called when fit is closed.
-
test_start
( ) ¶ -
This is optional. This method will be called when test is starting. This will execute after
reset
.
-
test_epoch_step
( ) ¶ -
This is optional. This method will be called every step. This will execute after
step
.
-
test_close
( ) ¶ -
This is optional. This method will be called when test is done.
-
test_terminate
( ) ¶ -
This is optional. In some cases, users want to terminate testing for certain conditions. By overriding this function, users will able to terminate the test process once a certain condition is met. The return value is False by default.
-