Andre Weiner, Janis Geise
TU Braunschweig, Institute of Fluid
Mechanics
Goals of flow control:
Categories of flow control:
energy input vs. efficiency gain
Categories of active flow control:
How to find the control law?
Closed-loop flow control with variable Reynolds number; source: F. Gabriel 2021.
Why CFD-based DRL?
Time-averaged attention weights $\bar{\kappa}$
Tom Krogmann 10.5281/zenodo.7636959
Training cost DrivAer model
CFD environments are expensive!
Create an intelligent agent that learns to map states to actions such that expected returns are maximized.
Flow past a cylinder benchmark.
What is our goal?
$r=3-(c_d + 0.1 |c_l|)$
$c_d$, $c_l$ - drag and lift coeff.; see J. Rabault et al.
Long-term consequences:
$$ G_t = \sum\limits_{l=0}^{N_t-t} \gamma^l R_{t+l} $$
What to expect in a given state?
$$ L_V = \frac{1}{N_\tau N_t} \sum\limits_{\tau = 1}^{N_\tau}\sum\limits_{t = 1}^{N_t} \left( V(s_t^\tau) - G_t^\tau \right)^2 $$
Was the selected action a good one?
$$\delta_t = R_t + \gamma V(s_{t+1}) - V(s_t) $$ $$ A_t^{GAE} = \sum\limits_{l=0}^{N_t-t} (\gamma \lambda)^l \delta_{t+l} $$
Making good actions more likely:
$$ J_\pi = \frac{1}{N_\tau N_t} \sum\limits_{\tau = 1}^{N_\tau}\sum\limits_{t = 1}^{N_t} \left( \frac{\pi(a_t|s_t)}{\pi^{old}(a_t|s_t)} A^{GAE,\tau}_t\right) $$
Janis Geise, Github, 10.5281/zenodo.7642927
Idea: replace CFD with model(s) in some episodes
for e in episodes:
if models_reliable():
sample_trajectories_from_models()
else:
sample_trajectories_from_simulation()
update_models()
update_policy()
Based on Model Ensemble TRPO.
When are the models reliable?
How to sample from the ensemble?
Recipe to create env. models:
Influence on the number of models; average over trajectories and 3 seeds.
Comparison of best policy over 3 seeds.
Average performance over 3 seeds.
What are the savings?
$50-70\%$ in training time