renom.algorithm

renom.algorithm.image

renom.algorithm.image.detection.yolo. build_truth ( y , total_w , total_h , cells , classes )

Use to transform a list of objects per image into a image*cells*cells*(5+classes) matrix. Each cell in image can only be labeled for 1 object.

“5” represents: objectness (0 or 1) and X Y W H

ex: Input: 2 objects in first image, 5 classes

y[0] = X Y W H 0 1 0 0 0 X Y W H 0 0 0 1 0
|—1st object—-|| —2nd object—|

Output: 7 * 7 cells * 10 per image

truth[0,0,0] = 1 X Y W H 0 1 0 0
(cell 0,0 has first object)
truth[0,0,1] = 0 0 0 0 0 0 0 0 0
(cell 0,1 has no object)
renom.algorithm.image.detection.yolo. apply_nms ( x , cells , bbox , classes , image_size , thresh=0.2 , iou_thresh=0.3 )

Apply to X predicted out of yolo_detector layer to get list of detected objects. Default threshold for detection is prob < 0.2. Default threshold for suppression is IOU > 0.4

Parameters:
  • cells ( int ) – Cell size.
  • bbox ( int ) – Number of bbox.
  • classes ( int ) – Number of class.
  • image_size ( tuple ) – Image size.
  • thresh ( float ) – A threshold for effective bounding box.
  • iou_thresh ( float ) – A threshold for bounding box suppression.
Returns:

List of dict object is returned. The dict includes keys class , box , score .

class renom.algorithm.image.detection.yolo. Yolo ( cells=7 , bbox=2 , classes=10 )

Loss function for Yolo detection. Last layer of the network needs to be following size: cells*cells*(bbox*5+classes) 5 is because every bounding box gets 1 score and 4 locations (x, y, w, h)

Ex: Prediction: 2 bbox per cell, 7*7 cells per image, 5 classes X[0,0,0] = S X Y W H S X Y W H 0 0 0 1 0

|—1st bbox–|| —2nd bbox–||-classes-|
Parameters:
  • cells ( int ) – Number of grid cells.
  • bbox ( int ) – Number of bbox.
  • classes ( int ) – Number of class.

renom.algorithm.reinforcement

class renom.algorithm.reinforcement.dqn. DQN ( q_network , target_q , state_size , action_pattern , gamma=0.99 , buffer_size=100000.0 )

DQN class This class provides a reinforcement learning agent including training method.

Parameters:
  • q_network ( Model ) – Q-Network.
  • state_size ( tuple, list ) – The size of state.
  • action_pattern ( int ) – The number of action pattern.
  • gamma ( float ) – Discount rate.
  • buffer_size ( float, int ) – The size of replay buffer.
action ( state )

This method returns an action according to the given state. :param state: A state of an environment.

Returns: Action.
Return type: (int, ndarray)
update ( )

This function updates target network.

train ( env , loss_func=<renom.layers.loss.clipped_mean_squared_error.ClippedMeanSquaredError object> , optimizer=<renom.optimizer.Rmsprop object> , epoch=100 , batch_size=32 , random_step=1000 , one_epoch_step=20000 , test_step=1000 , test_env=None , update_period=10000 , greedy_step=1000000 , min_greedy=0.0 , max_greedy=0.9 , test_greedy=0.95 , train_frequency=4 )

This method executes training of a q-network. Training will be done with epsilon-greedy method.

Parameters:
  • env ( function ) – A function which accepts action as an argument and returns prestate, state, reward and terminal.
  • loss_func ( Model ) – Loss function for training q-network.
  • optimizer ( Optimizer ) – Optimizer object for training q-network.
  • epoch ( int ) – Number of epoch for training.
  • batch_size ( int ) – Batch size.
  • random_step ( int ) – Number of random step which will be executed before training.
  • one_epoch_step ( int ) – Number of step of one epoch.
  • test_step ( int ) – Number of test step.
  • test_env ( function ) – A environment function for test.
  • update_period ( int ) – Period of updating target network.
  • greedy_step ( int ) – Number of step
  • min_greedy ( int ) – Minimum greedy value
  • max_greedy ( int ) – Maximum greedy value
  • test_greedy ( int ) – Greedy threshold
  • train_frequency ( int ) – For the learning step, training is done at this cycle
Returns:

A dictionary which includes reward list of training and loss list.

Return type:

(dict)

Example

>>> import renom as rm
>>> from renom.algorithm.reinforcement.dqn import DQN
>>>
>>> q_network = rm.Sequential([
...    rm.Conv2d(32, filter=8, stride=4),
...    rm.Relu(),
...    rm.Conv2d(64, filter=4, stride=2),
...    rm.Relu(),
...    rm.Conv2d(64, filter=3, stride=1),
...    rm.Relu(),
...    rm.Flatten(),
...    rm.Dense(512),
...    rm.Relu(),
...    rm.Dense(action_pattern)
... ])
>>>
>>> state_size = (4, 84, 84)
>>> action_pattern = 4
>>>
>>> def environment(action):
...     prestate = ...
...     state = ...
...     reward = ...
...     terminal = ...
...     return prestate, state, reward, terminal
>>>
>>> # Instantiation of DQN object
>>> dqn = DQN(model,
...           state_size=state_size,
...           action_pattern=action_pattern,
...           gamma=0.99,
...           buffer_size=buffer_size)
>>>
>>> # Training
>>> train_history = dqn.train(environment,
...           loss_func=rm.ClippedMeanSquaredError(clip=(-1, 1)),
...           epoch=50,
...           random_step=5000,
...           one_epoch_step=25000,
...           test_step=2500,
...           test_env=environment,
...           optimizer=rm.Rmsprop(lr=0.00025, g=0.95))
>>>
Executing random action for 5000 step...
epoch 000 avg loss:0.0060 avg reward:0.023: 100%|██████████| 25000/25000 [19:12<00:00, 21.70it/s]
    /// Result
    Average train error: 0.006
    Avg train reward in one epoch: 1.488
    Avg test reward in one epoch: 1.216
    Test reward: 63.000
    Greedy: 0.0225
    Buffer: 29537
    ...
>>>
>>> print(train_history["train_reward"])