How to Use ¶
When using neural networks in reinforcement learning, neural network is used as an agent with multiple signals as input and action as output. However, due to the difference in problems users are facing( such as what information the agent acquires from the environment, or what types of actions are required ) we should not only define the agent structure, but also the environment as well.
ReNom RL has multiple built-in algorithm, such as DQN, A3C etc. When implementing reinforcement learning with ReNom RL, the following 3 actions are required:
- Environment Preparation
- Model Preparation
- Implementation of Reinforcement Learning
1-Environment Preparation ¶
In order to use quickly apply the environment, fitting the environment structure according to BaseEnv module is required. In this section, we will introduce 2 ways of preparing the environment: using pre-prepared environment and implementing environment from scratch.
Using Pre-prepared Environment ¶
We prepared environment models that uses Open AI. For example, if the user wants to use breakout model for its test, we could call the environment as shown below.
from renom_rl.environ.openai import Breakout env = Breakout()
Implementing Environment from Scratch ¶
When creating an original environment, the object must be inherited, overwriting the variables and the function as mentioned below:
- action_shape: the shape of action
- state_shape: the shape of state
- reset(): the function that resets the environment
- sample(): the function that chooses random action
- step(): the function that outputs state, reward, terminal when taking a step
For example, when creating an original environment called CustomEnv(), the implemetation can be done as shown below:
import gym from renom_rl.environ import BaseEnv env = gym.make('BreakoutNoFrameskip-v4') class CustomEnv(BaseEnv): def __init__(self, env): self.env = env self.action_shape = 4 self.state_shape = (4, 84, 84) self.previous_frames =  self._reset_flag = True self._last_live = 5 super(CustomEnv, self).__init__() def reset(self): if self._reset_flag: self._reset_flag = False self.env.reset() n_step = np.random.randint(4, 32+1) for _ in range(n_step): state, _, _ = self.step(self.env.action_space.sample()) return state def sample(self): return self.env.action_space.sample() def render(self): self.env.render() def _preprocess(self, state): resized_image = Image.fromarray(state).resize((84, 110)).convert('L') image_array = np.asarray(resized_image)/255. final_image = image_array[26:110] # Confirm that the image is processed correctly. # Image.fromarray(np.clip(final_image.reshape(84, 84)*255, 0, 255).astype(np.uint8)).save("test.png") return final_image def step(self, action): state_list =  reward_list =  terminal = False for _ in range(4): # Use last frame. Other frames will be skipped. s, r, t, info = self.env.step(action) state = self._preprocess(s) reward_list.append(r) if self._last_live > info["ale.lives"]: t = True self._last_live = info["ale.lives"] if self._last_live > 0: self._reset_flag = False else: self._last_live = 5 self._reset_flag = True if t: terminal = True if len(self.previous_frames) > 3: self.previous_frames = self.previous_frames[1:] + [state] else: self.previous_frames += [state] state = np.stack(self.previous_frames) return state, np.array(np.sum(reward_list) > 0), terminal new_env=CustomEnv()
2-Model Preparation ¶
In this section, we use ReNom DL to build a model. Define the model as shown below when using a standard neural network.
q_network = rm.Sequential([rm.Conv2d(32, filter=8, stride=4), rm.Relu(), rm.Conv2d(64, filter=4, stride=2), rm.Relu(), rm.Conv2d(64, filter=3, stride=1), rm.Relu(), rm.Flatten(), rm.Dense(512), rm.Relu(), rm.Dense(custom_env.action_shape)])
3-Implementation of Reinforcement Learning ¶
After preparing the environment and the model, we now implement using a certain algorithm. The script below describes the algorithm for DQN.
import renom as rm from renom_rl.discrete.dqn import DQN model = DQN(custom_env, q_network)
After finishing the model, we run the module by implementing as shown below:
result = model.fit(render=False, greedy_step=1000000, random_step=5000, update_period=10000)
By implement as shown above, we can run DQN. For more information, please refer the API page on environment, and other algorithms.