How to Use ¶
Here we describe a quick tutorial on how to use ReNomRL. We will use DQN as an example.
ReNom RL has multiple built-in algorithms, such as DQN, A3C etc. When implementing reinforcement learning with ReNom RL, the following 3 actions are required:
Table of Contents
In order to use quickly apply the environment, fitting the environment structure according to BaseEnv module is required. In this section, we will introduce 2 ways of preparing the environment: using pre-prepared environment and implementing environment from scratch.
1.1 Using Pre-prepared Environment ¶
We prepared environment models that uses Open AI. For example, if the user wants to use CartPole model for its test, we could call the environment as shown below.
from renom_rl.environ.openai import Breakout env = CartPole00()
1.2 Implementing Environment from Scratch ¶
When creating an original environment, the object must be BaseEnv inherited object, with attributes and methods overridden as follows:
- action_shape: the shape of action
- state_shape: the shape of state
- reset(): Resets the environment and returns the initial state
- step(): Outputs state, reward, terminal when taking a step
- sample(): Returns random action (for DQN,DDQN)
For example, when creating an original environment called CustomEnv(), the implementation can be done as shown below:
class CustomEnv(BaseEnv): def __init__(self, env): self.action_shape = (2,) self.state_shape = (4,) self.env = env self.step_continue = 0 self.reward = 0 def reset(self): return self.env.reset() def sample(self): rand = self.env.action_space.sample() return rand def step(self, action): state, _, terminal, _ = self.env.step(int(action)) self.step_continue += 1 reward = 0 if terminal: if self.step_continue >= 200: reward = 1 else: reward = -1 self.reward=reward return state, reward, terminal new_env = CustomEnv()
In this section, we use ReNom DL to build a network. The network structure can vary depending on the problem. For DQN, define the network as shown below:
import renom as rm q_network = rm.Sequential([rm.Dense(30, ignore_bias=True), rm.Relu(), rm.Dense(30, ignore_bias=True), rm.Relu(), rm.Dense(2, ignore_bias=True)])
After preparing the environment and the network, we now implement DQN.
from renom_rl.discrete.dqn import DQN algorithm = DQN(custom_env, q_network)
After creating an instance, we can run the algorithm as shown below:
result = algorithm.fit()
Note that at the end of each epoch, it will start testing.
We can also test the model that was trained using the same environment:
result = algorithm.test()
By implementing as shown above, we can run have the network learn using DQN algorithm. For more information, Refer to the API page on environment, and other algorithms.
In How to Use - Detail - , further prior knowledge are documented so users can have a better idea of how to use ReNomRL.