{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining a reward function\n",
    "One of the most important steps for reinforcement learning is the definiton of the reward function. This example shows how to do this in StableRLS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this contains the environment class\n",
    "import stablerls.gymFMU as gymFMU\n",
    "# this will read our config file\n",
    "import stablerls.configreader as cfg_reader\n",
    "\n",
    "import numpy as np\n",
    "import logging"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class my_env(gymFMU.StableRLS):\n",
    "    def get_reward(self, action, observation):\n",
    "        \"\"\"This is my custom reward function\"\"\"\n",
    "        info = {}\n",
    "        reward = observation**2\n",
    "        terminated = False\n",
    "        truncated = False\n",
    "        return reward, terminated, truncated, info\n",
    "    "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For simplicity we already included the compiled FMU models for Linux and Windows. However, if you own Matlab you can compile the *.slx models on your own. If you want to compile the model you can keep the default FMU_path in the config file. Otherwise please change it to 00-Simulink_Windows.fmu or 00-Simulink_Linux.fmu depending on your operating system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First of all we have to read the config file\n",
    "config = cfg_reader.configreader('00-config.cfg')\n",
    "\n",
    "# if we want to we can compile the simulink model. \n",
    "# Matlab and Matlab Engine for python is required!\n",
    "if False:\n",
    "    import stablerls.createFMU as createFMU\n",
    "    createFMU.createFMU(config,'SimulinkExample00.slx')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The FMU is available now and the default options of the StableRLS gymnasium environment are sufficient to run the first simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Action: [1 2 3 4]\n",
      "Observation: [3.]\n",
      "Reward: [9.]\n",
      "\n",
      "Action: [2 4 6 8]\n",
      "Observation: [6.]\n",
      "Reward: [36.]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# create instance of the model\n",
    "env = my_env(config)\n",
    "\n",
    "# default reset call bevor the simulation starts\n",
    "obs = env.reset()\n",
    "\n",
    "# we wont change the action \n",
    "action = np.array([1,2,3,4])\n",
    "\n",
    "terminated = False\n",
    "truncated = False\n",
    "while not (terminated or truncated):\n",
    "\n",
    "    observation, reward, terminated, truncated, info  = env.step(action)\n",
    "    print(f'Action: {action}\\nObservation: {observation}\\nReward: {reward}\\n')\n",
    "    action = action * 2\n",
    "        \n",
    "env.close()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to include previous results you can use `env.inputs`/`self.inputs` or `env.outputs`/`self.outputs` for more complex reward calculation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}