Decision under uncertainty

Why should I forecast?

During the course we’ve focused on the questions:

what can be forecasted?

how can be forecasted?

In the next two lessons we will focus on the question why should I forecast?

Stochastic search

Calling uuu﻿ a decision variable and  w∼Ww \sim Ww∼W﻿ a stochastic variable, and L:x,w→R\mathcal{L}: x, w\rightarrow \mathbb{R}L:x,w→R﻿ an objective or loss function, most stochastic problems can be written as:

\begin{equation}\min _u \mathbb{E}_W \mathcal{L}(u, w)\end{equation}

where the expectation is taken over the distribution of www﻿. Here ww  w﻿ can be:

a single value or a vector → single-stage decisions

a sequence of random variables that evolve over time → multi-stage decisions

ALT

Example of single stage decision

Demand side management is the practice of controlling or influencing the power demand to minimize economic costs such as cost of energy and operational costs of the grid or increase local consumption of distributed generation. A common way to do this is through ripple control: a signal is sent to a group of buildings, forcing off thermo-electric devices such as heat pumps to defer their consumption.

We should decide how many hours a HP of a given building can force off, depending on tomorrow forecasted temperature T^d\hat{T}_dT^d​﻿​

We should consider the forecast as a random variable T^d(w)\hat{T}_d(w)T^d​(w)﻿ 

\begin{align} &\min_{u} u\\ &s.t.: \mathbb{E}_W(h_i(\hat{T}_d(w))<u)<0.9\end{align}

This formulation is equivalent to the general stochastic search (1). To see this we can use a Lagrangian relaxation: we relax the hard constraint (3) and insert it as a linear constraint into the objective:

\begin{equation}\max_\lambda\min_u u + \lambda ( \mathbb{E}_W(h_i(\hat{T}_d(w))<u)-0.9)\end{equation}

If we have access to future scenarios of TdT_dTd​﻿, we can now replace the probability measure with the empirical estimator of the expectation 

\begin{equation} \sim \max_\lambda\min_u u + \lambda \left( \frac{1}{N} \sum_{i=1}^N \mathbb{I}_{h_i(\hat{T}_{d,i})<u}-0.9 \right) \end{equation}

It can be easily shown that the solution to this expression is the empirical quantile of h(Td^)h(\hat{T_d})h(Td​^​)﻿​

ALT

Schematics of ripple control: a force-off signal is sent to a building, temporary shutting down a deferrable device (e.g. a heat pump). The consumption is shifted e.g. in periods in which there’s a PV overproduction 

ALT

Energy signature of a building with an HP: the energy need for a building is scattered against the external daily temperature. If we know the nominal power of the HP, we can derive an expression for h(Td)

ALT

Value at risk and conditional value at risk

For some problems, we are not interested in minimizing the expected value, but we are more keen to minimize the loss in case of extreme events

                                  Minimize expectation E\mathbb{E}E﻿​

cost minimization when cost is proportional w.r.t. to disturbance 

inventory optimization

demand satisfaction 

Minimize loss in case of extreme events 

cost minimization when cost is superlinear w.r.t. disturbance

safety-critical applications: eg. nuclear power plants, grid management

portfolio optimization

ALT

Conditional Value at risk of a Gaussian variable for alpha=0.15. The corresponding VaR is the first vertical blue line. Picture from Distributionally robust chance constrained data-enabled predictive control

ALT

Three metrics are of particular interest in this case:

Max

\min_W \max(l(w))

This result in a robust optimization in which we optimize for the worst case scenario.

 What happens when the error distribution is not bounded? 

Value at risk

\begin{align}\min_{W}&\operatorname{VaR}_\alpha(l(w))\\ \operatorname{VaR}_\alpha(w) &= \min \left\{z \mid F_W(z) \geq \alpha\right\}\end{align}

This is the definition of quantile: we are minimizing the loss in the case of the α\alphaα﻿-level worst case  

Its estimation is usually more stable than CVaR, especially for fat tail distributions.

Conditional Value at risk

\begin{align}&\min_{W}\operatorname{CVaR}_{1-\alpha}(l(w))\\ &\operatorname{CVaR}_{1-\alpha}(w)=\frac{1}{\alpha} \int_{F_W(z) \geqslant 1-\alpha} zdz\end{align}

This can be seen as a convex relaxation of the VaR. Depending on the shape of the distribution of uncertainty, this can be more or less conservative w.r.t. the VaR. e.g. for fat tail distributions this will result in a more conservative control.  

Different equivalent expressions exist, see e.g.  Pflug, G.C., “Some Remarks on the Value-at-Risk and on the Conditional Value-at-Risk”, 2000

Acerbi, C., “Spectral Measures of Risk: a coherent representation of subjective risk aversion”,  2002

Example of single stage decision using CVaR

A distribution system operator want to control multiple groups of flexible devices to both reduce its energy cost and minimize its daily peak. He wants to send control signals day ahead.

the problem is defined on multiple time steps, but we just have to take a decision at step 0: this can be seen as a single stage decision

we can model the response of the flexible devices through a non-parametric model f(u,x,θ)f(u, x, \theta)f(u,x,θ)﻿​

we have access to next day-ahead forecast scenarios of the uncontrolled power profile p^u(w)\hat{p}_u(w)p^​u​(w)﻿​

\begin{align}&\min_u CVaR_{1-\alpha} l(\hat{p}_u(w) + \hat{p}_{ctrl}(u))\\ s.t.: &\ \hat{p}_{ctrl} = f(u, x, \theta)\\ &\ w \in W, u \in \mathcal{U} \end{align}

where the loss is defined as l(x)=pex+ppmax⁡t(xt)l(x) = p_e x + p_p\max_{t}(x_t)l(x)=pe​x+pp​maxt​(xt​)﻿ where pep_epe​﻿ and ppp_ppp​﻿ are the price of energy and of the realized daily peak. 

Different methods can be applied, the easiest one is the following:

enumerate all plausible control scenarios [ui]i=1N[u_i]_{i=1}^N[ui​]i=1N​﻿​

for each control scenario, evaluate the controlled device responses p^ctrl,i=f(ui,x,θ)\hat{p}_{ctrl, i} = f(u_i, x, \theta)p^​ctrl,i​=f(ui​,x,θ)﻿​

for this control scenario obtain the total power profile for all the forecasting scenarios p^ctrl,i+\hat{p}_{ctrl, i}+p^​ctrl,i​+﻿p^u,j\hat{p}_{u, j}p^​u,j​﻿ and compute the associated losses li,j(p^ctrl,i+p^u,j)l_{i,j}(\hat{p}_{ctrl, i}+\hat{p}_{u, j})li,j​(p^​ctrl,i​+p^​u,j​)﻿.

for this control scenario compute the CVaR: find the 1−α1-\alpha1−α﻿ quantile of li,jl_{i,j}li,j​﻿ and compute the average loss above this value

among the set of retrieved CVaRs, find the minimum and it associated control 

ALT

Visualization of day ahead scenarios produced by a forecaster and control decision for different groups of flexible devices. 

ALT

Two stage decisions

ALT

Two-stage decisions are those in which decisions are made in two stages. These decisions are typically made under uncertainty, and the second stage decision is based on the realization of the uncertain variable. Examples of two-stage stochastic problems include:

Inventory control: a company must decide how much inventory to stock today to meet future demand, which is uncertain. 

Building infrastructure (siting ans sizing): the first decision is where and how big the infrastructure should be, given forecasted demand of its use. The second stage decision is how to operate the infrastructure for the best  

Unit commitment problem: a Transmission System Operator must decide/commit which thermal power generators to turn on at the beginning of the day. Once this decision is taken, it must decide how to operate the grid at each hour.

Multi-stage decisions

ALT

Stochastic optimization and reinforcement learning

Assuming that we have a variable xxx﻿ describing the state of the system, and a stage cost l,l,  l,﻿ a multi-stage stochastic problem can be described as: 

\begin{equation}V_0\left(x_0\right)=\max _{\pi_k \in \Pi} \mathbb{E}_W\left\{\sum_{t=0}^T l\left(x_{t}, u_{t}^{\pi_k}\left(x_{t}\right), w_t\right) \mid x_0\right\}\end{equation}

That is, we want to find an optimal policy πk\pi_kπk​﻿ (=algorithm) mapping the current state xtx_t xt​﻿ to an optimal action uπu^{\pi}uπ﻿. In the above expression the system dynamic is implicit: we are not describing the evolution of the system. There are two main approaches on how to take the system evolution into account:

Reinforcement learning

Bellman’s principle of optimality can be used to recursively write equation (6): 

V_t\left(x_t\right)=\max _{u \in \mathcal{U}_s}\left(l\left(x_t, u\right)+\gamma \sum_{x \in \mathcal{X}} P\left(x \mid x_t, u\right) V_{t+1}\left(x_{t+1}=x\right)\right)

In this expression the stochasticity affecting the problem is encoded in the so called transition matrix P(x∣xt,u)P(x\vert x_t, u)P(x∣xt​,u)﻿. 

If the state is low dimensional, we can hope to learn the value of a given state

we can then find an optimal policy taking ut∗=arg max⁡u(l(xt,u)+γ∑x∈XP(x∣xt,u)Vt+1(xt+1=x))u^{*}_t = \argmax_{u}\left(l\left(x_t, u\right)+\gamma \sum_{x \in \mathcal{X}} P\left(x \mid x_t, u\right) V_{t+1}\left(x_{t+1}=x\right)\right)ut∗​=argmaxu​(l(xt​,u)+γ∑x∈X​P(x∣xt​,u)Vt+1​(xt+1​=x))﻿​

However, the state space grows exponentially with the number of dimensions.  In this case, even defining the transition matrix is hard → we must rely to a model-less approach in which we use a simulator to simulate state transitions.

Control theory - Model predictive control (MPC)

In multistage stochastic control we have slightly different assumptions:

we have a (more or less accurate, parametric, time-variant or invariant) model of the dynamic system f(xt,ut,wt,θt)f(x_t, u_t, w_t, \theta_t)f(xt​,ut​,wt​,θt​)﻿​

www﻿ can both affect the system’s dynamics or the objective function

if linear models are used for the state dynamics, and noise is modeled with Gaussian distributions, state and input constraints are usually easy to enforce.

\begin{aligned}& J_{\mathrm{t}}^*=\min_{\pi_t} \mathbb{E}_W\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_t, u_t, w_t)\right) \\& \text { s.t.: } \quad x_{t+1}=f\left(x_t, u_t, w_t, \theta_{\mathrm{t}}\right), \\& u_t=\pi_t(x_0, \ldots, x_t), \\& W=[w_0, \ldots, w_{N-1}] \sim \mathcal{Q}^{W}, \theta_{\mathrm{t}} \sim \mathcal{Q}^{\theta_{\mathrm{t}}}, \\& \operatorname{Pr}\left(\bar{X}=[x_0, \ldots, x_N] \in \overline{\mathcal{X}}_j\right) \geq p_j \text { for all } j=1, \ldots, n_{c_x}, \\& \operatorname{Pr}\left(\bar{U}=[u_0, \ldots, u_{N-1}] \in \overline{\mathcal{U}}_j\right) \geq p_j \text { for all } j=1, \ldots, n_{c_u} \\&\end{aligned}

Introduction to MPC

ALT

Receding horizon controller aka MPC

at each timestep, solve a finite-horizon optimization problem [t, t+Np]

apply the optimal input in the [t, t+1] interval

at time t+1 solve the same problem over a shifted horizon 

ALT

                                   Classical controller

ALT

disturbance rejection d→ y

noise insensitivity n→ y

model uncertainty

usually in frequency domain

MPC

ALT

easy to include control constraints

easy to include state constraints

usually formulated in the time domain

Scenario based MPC

Scenario based MPC can be seen as a two stage stochastic program

at each time step, several scenarios of the disturbance are considered, so that we can take a sample approximation of EW\mathbb{E}_WEW​﻿​

a unique control problem is solved

\begin{align} J_{\mathrm{t}}^*&=\min_{u} \mathbb{E}_W\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_{t,s}, u_{t,s}, w_{t,s})\right)\\ &= \min_{u} \frac{1}{n_s}\sum_{s=1}^{n_s}\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_{t,s}, u_{t,s}, w_{t,s})\right)\end{align}

a so called non-anticipativity constraint is added: this constraint make the actions for all scenarios at step 0 equal (u0,s)u_{0, s})u0,s​)﻿​

ALT

Tree-based MPC

Forecasting scenarios are condensed into a scenario tree

Actions are chosen in a non-anticipative causal fashion. 

for example, u12u_1^2u12​﻿ is decided as a function of ϵ12\epsilon_1^2ϵ12​﻿, but not of any of ϵ2i,i∈N[1,μ(3)]\epsilon_2^i, i \in \mathbb{N}_{[1, \mu(3)]}ϵ2i​,i∈N[1,μ(3)]​﻿​

non-anticipativity constraints can be encoded in the same way: the picture on the right is the schematics of equality constraints for a problem with 8 scenarios; at step we have only 2 nodes, each of which bounds 4 scenarios.

ALT

Left: a stochastic scenario tree obtained from the scenarios. Right: graph representation of the tree

ALT

Approach hybridation

There are some similarities between RL and MPC:

MPC can be seen as a model-based approach, since we have an approximate function modeling the state transition. 

In both cases, the stage cost lll﻿  (reward in the RL literature), could not coincide with the actual cost, but could be an easier approximation (including temporal discounts). 

RL can directly optimize the policy π\piπ﻿, a.k.a. policy function approximation, as opposed to value function approximation

Adding learning in control problems

system dynamics: a parametric or a non-parametric model f(xt,wt,θf)f(x_t, w_t, \theta_f)f(xt​,wt​,θf​)﻿ is learned from data and used to obtain the next state values xt+1x_{t+1}xt+1​﻿​

Gaussian Process: the system state innovations are modeled as a learnable gaussian process: [X+ft(x,u)]∼N(0,[K+Iσw2Kx,uKx,uTk([x,u],[x,u])])\left[\begin{array}{c}X^{+} \\f_{\mathrm{t}}(x, u)\end{array}\right] \sim \mathcal{N}\left(0,\left[\begin{array}{cc}K+I \sigma_w^2 & K_{x, u} \\K_{x, u}^T & k([x, u],[x, u])\end{array}\right]\right)[X+ft​(x,u)​]∼N(0,[K+Iσw2​Kx,uT​​Kx,u​k([x,u],[x,u])​])﻿  

Input-convex and invertible NN: at each timestep gradient descent is used to optimize the control, passing through a differentiable, leared model of the system    

learning of control policy

Learning from a teacher: a NN is used to learn from previous solved instances of a control problem

Analytic policy gradient: a NN is trained to directly minimize the open-loop loss function, and provides a non-parametric policy π(xt,W)\pi(x_t, W)π(xt​,W)﻿ 

approximate dynamic programming

the stage cost ll l﻿ is rewritten in terms of value functions; as it then usually dependent on high dimensional states and noise, the value function is then replaced by a machine-learning based model.