How to Use

Here we describe a quick tutorial on how to use ReNomRL. We will use DQN as an example.

ReNom RL has multiple built-in algorithms, such as DQN, A3C etc. When implementing reinforcement learning with ReNom RL, the following 3 actions are required:

1- Environment Preparation

In order to use quickly apply the environment, fitting the environment structure according to BaseEnv module is required. In this section, we will introduce 2 ways of preparing the environment: using pre-prepared environment and implementing environment from scratch.

1.1 Using Pre-prepared Environment

We prepared environment models that uses Open AI. For example, if the user wants to use CartPole model for its test, we could call the environment as shown below.

from renom_rl.environ.openai import Breakout
env = CartPole00()

1.2 Implementing Environment from Scratch

When creating an original environment, the object must be BaseEnv inherited object, with attributes and methods overridden as follows:

  • action_shape: the shape of action
  • state_shape: the shape of state
  • reset(): Resets the environment and returns the initial state
  • step(): Outputs state, reward, terminal when taking a step
  • sample(): Returns random action (for DQN,DDQN)

For example, when creating an original environment called CustomEnv(), the implementation can be done as shown below:

class CustomEnv(BaseEnv):

    def __init__(self, env):
        self.action_shape = (2,)
        self.state_shape = (4,)

        self.env = env
        self.step_continue = 0
        self.reward = 0



    def reset(self):
        return self.env.reset()


    def sample(self):
        rand = self.env.action_space.sample()
        return rand

    def step(self, action):
        state, _, terminal, _ = self.env.step(int(action))

        self.step_continue += 1
        reward = 0

        if terminal:
            if self.step_continue >= 200:
                reward = 1
            else:
                reward = -1

        self.reward=reward

        return state, reward, terminal

new_env = CustomEnv()

2- Network Preparation

In this section, we use ReNom DL to build a network. The network structure can vary depending on the problem. For DQN, define the network as shown below:

import renom as rm
q_network = rm.Sequential([rm.Dense(30, ignore_bias=True),
                           rm.Relu(),
                           rm.Dense(30, ignore_bias=True),
                           rm.Relu(),
                           rm.Dense(2, ignore_bias=True)])

3- Implementation of Reinforcement Learning

After preparing the environment and the network, we now implement DQN.

from renom_rl.discrete.dqn import DQN

algorithm = DQN(custom_env, q_network)

After creating an instance, we can run the algorithm as shown below:

result = algorithm.fit()

Note that at the end of each epoch, it will start testing.

We can also test the model that was trained using the same environment:

result = algorithm.test()

By implementing as shown above, we can run have the network learn using DQN algorithm. For more information, Refer to the API page on environment, and other algorithms.

In How to Use - Detail - , further prior knowledge are documented so users can have a better idea of how to use ReNomRL.