Andre Weiner, Fabian Gabriel, Darshan Thummar
TU Braunschweig, Institute of Fluid
Mechanics
Darshan Thummar: Active flow control in simulations of fluid flows based on DRL, Github 

Fabian Gabriel: Active control of the flow past a cylinder under Reynolds number variation using DRL, Github 
Based on M. Schäfer, S. Turek (1996); $Re=100$; 2D; pimpleFoam.
Flow past a circular cylinder at $Re=100$  without control.
Can we reduce drag and lift forces?
Proximal policy optimization (PPO) workflow (GAE  generalized advantage estimate).
Policy networks outputs parameters of probability density function.
reward at time $t$
$$ R_t = r_0  \left( r_1 c_D + r_2 c_L + r_3 \dot{\theta} + r_4 \ddot{\theta} \right) $$
longterm consequences
$$ G_t = \sum\limits_{l=0}^{N_tt} \gamma^l R_{t+l} $$
learning what to expect in a given state  value function loss
$$ L_V = \frac{1}{N_\tau N_t} \sum\limits_{\tau = 1}^{N_\tau}\sum\limits_{t = 1}^{N_t} \left( V(s_t^\tau)  G_t^\tau \right)^2 $$
Was the selected action a good one?
$$\delta_t = R_t + \gamma V(s_{t+1})  V(s_t) $$ $$ A_t^{GAE} = \sum\limits_{l=0}^{N_tt} (\gamma \lambda)^l \delta_{t+l} $$
make good actions more likely  policy objective function
$$ J_\pi = \frac{1}{N_\tau N_t} \sum\limits_{\tau = 1}^{N_\tau}\sum\limits_{t = 1}^{N_t} \left( \frac{\pi(a_ts_t)}{\pi^{old}(a_ts_t)} A^{GAE,\tau}_t\right) $$
Refer to R. Paris et al. 2021 and the references therein for similar works employing PPO.
$$ t_{train} \approx t_{sim,single} \times N_{episodes} $$
current cylinder example:
$$ t_{train} \approx 0.5h \times 50 = 25h $$
$$t_{cpu} = t_{train} \times N_{cpu, single} \times N_\tau$$
current cylinder example:
$$ t_{cpu} = 25h \times 4 \times 10 = 1000h $$
Python/PyTorch
Implementation follows closely chapter 12 of Miguel Morales's Grokking Deep Reinforcement Learning
C++/OpenFOAM/PyTorch
Boundary condition defined in 0/U
cylinder
{
type agentRotatingWallVelocity;
// center of cylinder
origin (0.2 0.2 0.0);
// axis of rotation; normal to 2D domain
axis (0 0 1);
// name of the policy network; must be a torchscript file
policy "policy.pt";
// when to start controlling
startTime 0.01;
// how often to evaluate policy
interval 20;
// if true, the angular velocity is sampled from a Gaussian distribution
// if false, the mean value predicted by the policy is used
train true;
// maximum allowed angular velocity
absOmegaMax 0.05;
}
Comparison of uncontrolled, openloop controlled, and closedloop controlled drag.
Angular velocity for open and closedloop control.
Variable inlet velocity/Reynolds number $Re(t) = 250 + 150\mathrm{sin}(\pi t)$
Drag coefficient for transient inlet velocity: uncontrolled and controlled.
Get involved: a.weiner@tubraunschweig.de
Andre Weiner  Tomislav Marić 
a.weiner@tubraunschweig.de  maric@mma.tudarmstadt.de 
Short and long term objectives available at
https://github.com/AndreWeiner/mlfoam