Defining a reward function
One of the most important steps for reinforcement learning is the definiton of the reward function. This example shows how to do this in StableRLS.
[1]:
# this contains the environment class
import stablerls.gymFMU as gymFMU
# this will read our config file
import stablerls.configreader as cfg_reader
import numpy as np
import logging
[2]:
class my_env(gymFMU.StableRLS):
def get_reward(self, action, observation):
"""This is my custom reward function"""
info = {}
reward = observation**2
terminated = False
truncated = False
return reward, terminated, truncated, info
For simplicity we already included the compiled FMU models for Linux and Windows. However, if you own Matlab you can compile the *.slx models on your own. If you want to compile the model you can keep the default FMU_path in the config file. Otherwise please change it to 00-Simulink_Windows.fmu or 00-Simulink_Linux.fmu depending on your operating system.
[3]:
# First of all we have to read the config file
config = cfg_reader.configreader('00-config.cfg')
# if we want to we can compile the simulink model.
# Matlab and Matlab Engine for python is required!
if False:
import stablerls.createFMU as createFMU
createFMU.createFMU(config,'SimulinkExample00.slx')
The FMU is available now and the default options of the StableRLS gymnasium environment are sufficient to run the first simulation.
[4]:
# create instance of the model
env = my_env(config)
# default reset call bevor the simulation starts
obs = env.reset()
# we wont change the action
action = np.array([1,2,3,4])
terminated = False
truncated = False
while not (terminated or truncated):
observation, reward, terminated, truncated, info = env.step(action)
print(f'Action: {action}\nObservation: {observation}\nReward: {reward}\n')
action = action * 2
env.close()
Action: [1 2 3 4]
Observation: [3.]
Reward: [9.]
Action: [2 4 6 8]
Observation: [6.]
Reward: [36.]
If you want to include previous results you can use env.inputs/self.inputs or env.outputs/self.outputs for more complex reward calculation.