Abstract
Reinforcement learning and control theory synergistically combine to address complex challenges in dynamical systems. Over the past decade, integrating reinforcement learning algorithms into control theory has garnered significant attention, particularly in adaptive control, where controllers learn and adapt to unknown system dynamics. Within this field, adaptive control for linear quadratic systems is of special interest, due to elegant analytical solutions and extensive practical applications across engineering domains. In this thesis, we conduct an empirical comparison of two paradigms of reinforcement learning, model-based and model-free, in controlling linear quadratic systems. We investigate model-based algorithms, specifically the Augmented Reward-Biased Maximum Likelihood Estimator (ARBMLE) and Stabilizing Learning (STABL), which explicitly model system dynamics to compute control inputs. Conversely, we examine the model-free approach using the Proximal Policy Optimization algorithm, which learns control policies directly by approximating the mapping from system states to control inputs without constructing explicit models. Our empirical studies reveal that the model-based algorithms, ARBMLE and STABL, exhibit strong exploratory behaviour that can lead to destabilisation of the control system during initial learning phases. The STABL algorithm uses additional exciting noise on its control inputs which has detrimental effect on its performance. This excessive exploration results in suboptimal performance and raises concerns about their practicality in safety-critical applications. In contrast, the model-free Proximal Policy Optimization algorithm focuses on leveraging the initial stabilising controller and employs more conservative exploration strategies. As a result, Proximal Policy Optimization achieves superior performance across key metrics, including stability, cumulative cost, and robustness to disturbances. This work contributes to the understanding of the trade-offs between model-based and model-free reinforcement learning approaches in linear quadratic control, highlighting the importance of balancing exploration and exploitation. Our findings provide valuable insights for researchers and practitioners aiming to deploy reinforcement learning algorithms in control systems where reliability and performance are critical.
Translated title of the contribution | Vergleich von modellbasierten und modellfreien Reinforcement Learning Algorithmen für Stochastic Linear Quadratic Control |
---|---|
Original language | English |
Qualification | Dipl.-Ing. |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 11 Apr 2025 |
DOIs | |
Publication status | Published - 2025 |
Bibliographical note
no embargoKeywords
- Reinforcement Learning
- Linear Quadratic Control
- Stochastic Linear Quadratic Control
- Model-Based
- Model-Free