Policy Search and Network-Dependent Dynamics

At times, we wish to model a component of the dynamics with a neural network. A common example is the policy search case, when the closed-loop dynamics include a neural network controller. In such cases, we consider the dynamics to take the form of $\frac{dx}{dt} = f(x, u, p, t)$, where $u$ is the control input/the contribution to the dynamics from the neural network. We provide the add_policy_search function to transform a NeuralLyapunovStructure to include training the neural network to represent not just the Lyapunov function, but also the relevant part of the dynamics.

Similar to get_numerical_lyapunov_function, we provide the get_policy convenience function to construct $u(x)$ that can be combined with the open-loop dynamics $f(x, u, p, t)$ to create closed loop dynamics $f_{cl}(x, p, t) = f(x, u(x), p, t)$.

NeuralLyapunov.add_policy_search — Function

add_policy_search(lyapunov_structure, new_dims; control_structure)

Add dependence on the neural network to the dynamics in a NeuralLyapunovStructure.

Arguments

lyapunov_structure::NeuralLyapunovStructure: provides structure for $V, V̇$; should assume dynamics take a form of f(x, p, t).
new_dims::Integer: number of outputs of the neural network to pass into the dynamics through control_structure.

Keyword Arguments

control_structure::Function: transforms the final new_dims outputs of the neural net before passing them into the dynamics; defaults to identity, passing in the neural network outputs unchanged.

The returned NeuralLyapunovStructure expects dynamics of the form f(x, u, p, t), where u captures the dependence of dynamics on the neural network (e.g., through a control input). When evaluating the dynamics, it uses u = control_structure(phi_end(x)) where phi_end is a function that returns the final new_dims outputs of the neural network. The other lyapunov_structure.network_dim outputs are used for calculating $V$ and $V̇$, as specified originally by lyapunov_structure.

source

NeuralLyapunov.get_policy — Function

get_policy(phi, θ, network_dim, control_dim; control_structure)

Generate a Julia function representing the control policy/unmodeled portion of the dynamics as a function of the state.

The returned function can operate on a state vector or columnwise on a matrix of state vectors.

Arguments

phi: the neural network, represented as phi(state, θ) if the neural network has a single output, or a Vector of the same with one entry per neural network output.
θ: the parameters of the neural network; θ[:φ1] should be the parameters of the first neural network output (even if there is only one), θ[:φ2] the parameters of the second (if there are multiple), and so on.
network_dim: total number of neural network outputs.
control_dim: number of neural network outputs used in the control policy.

Keyword Arguments

control_structure: transforms the final control_dim outputs of the neural net before passing them into the dynamics; defaults to identity, passing in the neural network outputs unchanged.

source