Policy Search and Network-Dependent Dynamics
At times, we wish to model a component of the dynamics with a neural network. A common example is the policy search case, when the closed-loop dynamics include a neural network controller. In such cases, we consider the dynamics to take the form of $\frac{dx}{dt} = f(x, u, p, t)$, where $u$ is the control input/the contribution to the dynamics from the neural network. We provide the add_policy_search function to transform a NeuralLyapunovStructure to include training the neural network to represent not just the Lyapunov function, but also the relevant part of the dynamics.
Similar to get_numerical_lyapunov_function, we provide the get_policy convenience function to construct $u(x)$ that can be combined with the open-loop dynamics $f(x, u, p, t)$ to create closed loop dynamics $f_{cl}(x, p, t) = f(x, u(x), p, t)$.
NeuralLyapunov.add_policy_search — Functionadd_policy_search(lyapunov_structure, new_dims; control_structure)Add dependence on the neural network to the dynamics in a NeuralLyapunovStructure.
Arguments
lyapunov_structure::NeuralLyapunovStructure: provides structure for $V, V̇$; should assume dynamics take a form off(x, p, t).new_dims::Integer: number of outputs of the neural network to pass into the dynamics throughcontrol_structure.
Keyword Arguments
control_structure::Function: transforms the finalnew_dimsoutputs of the neural net before passing them into the dynamics; defaults toidentity, passing in the neural network outputs unchanged.
The returned NeuralLyapunovStructure expects dynamics of the form f(x, u, p, t), where u captures the dependence of dynamics on the neural network (e.g., through a control input). When evaluating the dynamics, it uses u = control_structure(phi_end(x)) where phi_end is a function that returns the final new_dims outputs of the neural network. The other lyapunov_structure.network_dim outputs are used for calculating $V$ and $V̇$, as specified originally by lyapunov_structure.
NeuralLyapunov.get_policy — Functionget_policy(phi, θ, network_dim, control_dim; control_structure)Generate a Julia function representing the control policy/unmodeled portion of the dynamics as a function of the state.
The returned function can operate on a state vector or columnwise on a matrix of state vectors.
Arguments
phi: the neural network, represented asphi(state, θ)if the neural network has a single output, or aVectorof the same with one entry per neural network output.θ: the parameters of the neural network;θ[:φ1]should be the parameters of the first neural network output (even if there is only one),θ[:φ2]the parameters of the second (if there are multiple), and so on.network_dim: total number of neural network outputs.control_dim: number of neural network outputs used in the control policy.
Keyword Arguments
control_structure: transforms the finalcontrol_dimoutputs of the neural net before passing them into the dynamics; defaults toidentity, passing in the neural network outputs unchanged.