Andre Weiner,, Tom Krogmann, Janis Geise
TU Braunschweig, Institute of Fluid
Mechanics
motivation for closed-loop active flow control
How to find the control law?
Proximal policy optimization (PPO) workflow (GAE - generalized advantage estimate).
Training cost DrivAer model
CFD environments are expensive!
Tom Krogmann, Github, 10.5281/zenodo.7636959
Challenge with optimal sensor placement and flow control:
actuation changes the dynamical system
Idea: combined sensor placement and flow control optimization via attention layer
$$\mathbf{f} = \mathbf{W}_2\mathrm{tanh}(\mathbf{W}_1\mathbf{x}_{in})$$
$\mathbf{W}_1\in \mathbb{R}^{N_b\times N_{in}}$, $\mathbf{W}_2\in \mathbb{R}^{N_{in} \times N_b}$, $N_b < N_{in}$
$$ \kappa_i = \mathrm{exp}(f_i)/\sum_i\mathrm{exp}(f_i)$$
$\kappa_i$ - attention weight of sensor $i$
Time-averaged attention weights $\bar{\kappa}$.
Results obtained with top 7 sensors (MDI - mean decrease of impurity, modes - QR column pivoting).
Janis Geise, Github, 10.5281/zenodo.7642927
Idea: replace CFD with model(s) in some episodes
for e in episodes:
if models_reliable():
sample_trajectories_from_models()
else:
sample_trajectories_from_simulation()
update_models()
update_policy()
Based on Model Ensemble TRPO.
When are the models reliable?
How to sample from the ensemble?
Recipe to create env. models:
Cylinder benchmark case; $Re=100$.
control objective
$$r = c_{d,ref} - (c_d + 0.1|c_l|)$$
Rewards over episodes; mean/std. over 10 trajectories and 5 seeds; markers indicate CFD episodes.
Number of discarded trajectories $N_r$ for various ensembles.
Pinball benchmark case; $Re=100$.
Mean drag/lift over episodes.
Execution time $t_{exec}$ for various ensembles normalized by model-free training time $t_{MF}$.
Evaluation of final policy.