Comparison of Model-Based and Model-Free Reinforcement Learning Algorithms for Stochastic Linear Quadratic Control

Lucas Weitering

Research output: ThesisMaster's Thesis

17 Downloads (Pure)

Abstract

Reinforcement learning and control theory synergistically combine to address complex challenges in dynamical systems. Over the past decade, integrating reinforcement learning algorithms into control theory has garnered significant attention, particularly in adaptive control, where controllers learn and adapt to unknown system dynamics. Within this field, adaptive control for linear quadratic systems is of special interest, due to elegant analytical solutions and extensive practical applications across engineering domains. In this thesis, we conduct an empirical comparison of two paradigms of reinforcement learning, model-based and model-free, in controlling linear quadratic systems. We investigate model-based algorithms, specifically the Augmented Reward-Biased Maximum Likelihood Estimator (ARBMLE) and Stabilizing Learning (STABL), which explicitly model system dynamics to compute control inputs. Conversely, we examine the model-free approach using the Proximal Policy Optimization algorithm, which learns control policies directly by approximating the mapping from system states to control inputs without constructing explicit models. Our empirical studies reveal that the model-based algorithms, ARBMLE and STABL, exhibit strong exploratory behaviour that can lead to destabilisation of the control system during initial learning phases. The STABL algorithm uses additional exciting noise on its control inputs which has detrimental effect on its performance. This excessive exploration results in suboptimal performance and raises concerns about their practicality in safety-critical applications. In contrast, the model-free Proximal Policy Optimization algorithm focuses on leveraging the initial stabilising controller and employs more conservative exploration strategies. As a result, Proximal Policy Optimization achieves superior performance across key metrics, including stability, cumulative cost, and robustness to disturbances. This work contributes to the understanding of the trade-offs between model-based and model-free reinforcement learning approaches in linear quadratic control, highlighting the importance of balancing exploration and exploitation. Our findings provide valuable insights for researchers and practitioners aiming to deploy reinforcement learning algorithms in control systems where reliability and performance are critical.
Translated title of the contributionVergleich von modellbasierten und modellfreien Reinforcement Learning Algorithmen für Stochastic Linear Quadratic Control
Original languageEnglish
QualificationDipl.-Ing.
Awarding Institution
  • Montanuniversität
Supervisors/Advisors
  • Auer, Peter, Supervisor (internal)
Award date11 Apr 2025
DOIs
Publication statusPublished - 2025

Bibliographical note

no embargo

Keywords

  • Reinforcement Learning
  • Linear Quadratic Control
  • Stochastic Linear Quadratic Control
  • Model-Based
  • Model-Free

Cite this