Decision under uncertainty
Built with
Page icon

Decision under uncertainty

Why should I forecast?

During the course we’ve focused on the questions:
what can be forecasted?
how can be forecasted?
In the next two lessons we will focus on the question why should I forecast?

Stochastic search

Calling uu a decision variable and wWw \sim W a stochastic variable, and L:x,wR\mathcal{L}: x, w\rightarrow \mathbb{R} an objective or loss function, most stochastic problems can be written as:
minuEWL(u,w)\begin{equation}\min _u \mathbb{E}_W \mathcal{L}(u, w)\end{equation}
where the expectation is taken over the distribution of ww. Here ww  can be:
a single value or a vector → single-stage decisions
a sequence of random variables that evolve over time → multi-stage decisions
ALT
Example of single stage decision
Demand side management is the practice of controlling or influencing the power demand to minimize economic costs such as cost of energy and operational costs of the grid or increase local consumption of distributed generation. A common way to do this is through ripple control: a signal is sent to a group of buildings, forcing off thermo-electric devices such as heat pumps to defer their consumption.
We should decide how many hours a HP of a given building can force off, depending on tomorrow forecasted temperature T^d\hat{T}_d
We should consider the forecast as a random variable T^d(w)\hat{T}_d(w)
minuus.t.:EW(hi(T^d(w))<u)<0.9\begin{align} &\min_{u} u\\ &s.t.: \mathbb{E}_W(h_i(\hat{T}_d(w))<u)<0.9\end{align}
This formulation is equivalent to the general stochastic search (1). To see this we can use a Lagrangian relaxation: we relax the hard constraint (3) and insert it as a linear constraint into the objective:
maxλminuu+λ(EW(hi(T^d(w))<u)0.9)\begin{equation}\max_\lambda\min_u u + \lambda ( \mathbb{E}_W(h_i(\hat{T}_d(w))<u)-0.9)\end{equation}
If we have access to future scenarios of TdT_d, we can now replace the probability measure with the empirical estimator of the expectation
maxλminuu+λ(1Ni=1NIhi(T^d,i)<u0.9)\begin{equation} \sim \max_\lambda\min_u u + \lambda \left( \frac{1}{N} \sum_{i=1}^N \mathbb{I}_{h_i(\hat{T}_{d,i})<u}-0.9 \right) \end{equation}
It can be easily shown that the solution to this expression is the empirical quantile of h(Td^)h(\hat{T_d})
ALT
Schematics of ripple control: a force-off signal is sent to a building, temporary shutting down a deferrable device (e.g. a heat pump). The consumption is shifted e.g. in periods in which there’s a PV overproduction
ALT
ALT
Energy signature of a building with an HP: the energy need for a building is scattered against the external daily temperature. If we know the nominal power of the HP, we can derive an expression for h(Td)
ALT

Value at risk and conditional value at risk

For some problems, we are not interested in minimizing the expected value, but we are more keen to minimize the loss in case of extreme events
Minimize expectation E\mathbb{E}
cost minimization when cost is proportional w.r.t. to disturbance
inventory optimization
demand satisfaction
Minimize loss in case of extreme events
cost minimization when cost is superlinear w.r.t. disturbance
safety-critical applications: eg. nuclear power plants, grid management
portfolio optimization
ALT
Conditional Value at risk of a Gaussian variable for alpha=0.15. The corresponding VaR is the first vertical blue line. Picture from Distributionally robust chance constrained data-enabled predictive control
ALT
Three metrics are of particular interest in this case:
Max
minWmax(l(w))\min_W \max(l(w))
This result in a robust optimization in which we optimize for the worst case scenario.
❓ What happens when the error distribution is not bounded?
Value at risk
minWVaRα(l(w))VaRα(w)=min{zFW(z)α}\begin{align}\min_{W}&\operatorname{VaR}_\alpha(l(w))\\ \operatorname{VaR}_\alpha(w) &= \min \left\{z \mid F_W(z) \geq \alpha\right\}\end{align}
This is the definition of quantile: we are minimizing the loss in the case of the α\alpha-level worst case
Its estimation is usually more stable than CVaR, especially for fat tail distributions.
minWCVaR1α(l(w))CVaR1α(w)=1αFW(z)1αzdz\begin{align}&\min_{W}\operatorname{CVaR}_{1-\alpha}(l(w))\\ &\operatorname{CVaR}_{1-\alpha}(w)=\frac{1}{\alpha} \int_{F_W(z) \geqslant 1-\alpha} zdz\end{align}
This can be seen as a convex relaxation of the VaR. Depending on the shape of the distribution of uncertainty, this can be more or less conservative w.r.t. the VaR. e.g. for fat tail distributions this will result in a more conservative control.
Different equivalent expressions exist, see e.g. Pflug, G.C., “Some Remarks on the Value-at-Risk and on the Conditional Value-at-Risk”, 2000
Acerbi, C., “Spectral Measures of Risk: a coherent representation of subjective risk aversion”, 2002
Example of single stage decision using CVaR
A distribution system operator want to control multiple groups of flexible devices to both reduce its energy cost and minimize its daily peak. He wants to send control signals day ahead.
the problem is defined on multiple time steps, but we just have to take a decision at step 0: this can be seen as a single stage decision
we can model the response of the flexible devices through a non-parametric model f(u,x,θ)f(u, x, \theta)
we have access to next day-ahead forecast scenarios of the uncontrolled power profile p^u(w)\hat{p}_u(w)
minuCVaR1αl(p^u(w)+p^ctrl(u))s.t.: p^ctrl=f(u,x,θ) wW,uU\begin{align}&\min_u CVaR_{1-\alpha} l(\hat{p}_u(w) + \hat{p}_{ctrl}(u))\\ s.t.: &\ \hat{p}_{ctrl} = f(u, x, \theta)\\ &\ w \in W, u \in \mathcal{U} \end{align}
where the loss is defined as l(x)=pex+ppmaxt(xt)l(x) = p_e x + p_p\max_{t}(x_t) where pep_e and ppp_p are the price of energy and of the realized daily peak.
Different methods can be applied, the easiest one is the following:
enumerate all plausible control scenarios [ui]i=1N[u_i]_{i=1}^N
for each control scenario, evaluate the controlled device responses p^ctrl,i=f(ui,x,θ)\hat{p}_{ctrl, i} = f(u_i, x, \theta)
for this control scenario obtain the total power profile for all the forecasting scenarios p^ctrl,i+\hat{p}_{ctrl, i}+p^u,j\hat{p}_{u, j} and compute the associated losses li,j(p^ctrl,i+p^u,j)l_{i,j}(\hat{p}_{ctrl, i}+\hat{p}_{u, j}).
for this control scenario compute the CVaR: find the 1α1-\alpha quantile of li,jl_{i,j} and compute the average loss above this value
among the set of retrieved CVaRs, find the minimum and it associated control
ALT
Visualization of day ahead scenarios produced by a forecaster and control decision for different groups of flexible devices.
ALT

Two stage decisions

ALT
Two-stage decisions are those in which decisions are made in two stages. These decisions are typically made under uncertainty, and the second stage decision is based on the realization of the uncertain variable. Examples of two-stage stochastic problems include:
Inventory control: a company must decide how much inventory to stock today to meet future demand, which is uncertain.
Building infrastructure (siting ans sizing): the first decision is where and how big the infrastructure should be, given forecasted demand of its use. The second stage decision is how to operate the infrastructure for the best
Unit commitment problem: a Transmission System Operator must decide/commit which thermal power generators to turn on at the beginning of the day. Once this decision is taken, it must decide how to operate the grid at each hour.

Multi-stage decisions

ALT

Stochastic optimization and reinforcement learning

Assuming that we have a variable xx describing the state of the system, and a stage cost l,l,  a multi-stage stochastic problem can be described as:
V0(x0)=maxπkΠEW{t=0Tl(xt,utπk(xt),wt)x0}\begin{equation}V_0\left(x_0\right)=\max _{\pi_k \in \Pi} \mathbb{E}_W\left\{\sum_{t=0}^T l\left(x_{t}, u_{t}^{\pi_k}\left(x_{t}\right), w_t\right) \mid x_0\right\}\end{equation}
That is, we want to find an optimal policy πk\pi_k (=algorithm) mapping the current state xtx_t  to an optimal action uπu^{\pi}. In the above expression the system dynamic is implicit: we are not describing the evolution of the system. There are two main approaches on how to take the system evolution into account:
Reinforcement learning
Bellman’s principle of optimality can be used to recursively write equation (6):
Vt(xt)=maxuUs(l(xt,u)+γxXP(xxt,u)Vt+1(xt+1=x))V_t\left(x_t\right)=\max _{u \in \mathcal{U}_s}\left(l\left(x_t, u\right)+\gamma \sum_{x \in \mathcal{X}} P\left(x \mid x_t, u\right) V_{t+1}\left(x_{t+1}=x\right)\right)
In this expression the stochasticity affecting the problem is encoded in the so called transition matrix P(xxt,u)P(x\vert x_t, u).
If the state is low dimensional, we can hope to learn the value of a given state
we can then find an optimal policy taking ut=arg maxu(l(xt,u)+γxXP(xxt,u)Vt+1(xt+1=x))u^{*}_t = \argmax_{u}\left(l\left(x_t, u\right)+\gamma \sum_{x \in \mathcal{X}} P\left(x \mid x_t, u\right) V_{t+1}\left(x_{t+1}=x\right)\right)
However, the state space grows exponentially with the number of dimensions. In this case, even defining the transition matrix is hard → we must rely to a model-less approach in which we use a simulator to simulate state transitions.
Control theory - Model predictive control (MPC)
In multistage stochastic control we have slightly different assumptions:
we have a (more or less accurate, parametric, time-variant or invariant) model of the dynamic system f(xt,ut,wt,θt)f(x_t, u_t, w_t, \theta_t)
ww can both affect the system’s dynamics or the objective function
if linear models are used for the state dynamics, and noise is modeled with Gaussian distributions, state and input constraints are usually easy to enforce.
Jt=minπtEW(t=0Tlt(xt,ut,wt)) s.t.: xt+1=f(xt,ut,wt,θt),ut=πt(x0,,xt),W=[w0,,wN1]QW,θtQθt,Pr(Xˉ=[x0,,xN]Xj)pj for all j=1,,ncx,Pr(Uˉ=[u0,,uN1]Uj)pj for all j=1,,ncu\begin{aligned}& J_{\mathrm{t}}^*=\min_{\pi_t} \mathbb{E}_W\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_t, u_t, w_t)\right) \\& \text { s.t.: } \quad x_{t+1}=f\left(x_t, u_t, w_t, \theta_{\mathrm{t}}\right), \\& u_t=\pi_t(x_0, \ldots, x_t), \\& W=[w_0, \ldots, w_{N-1}] \sim \mathcal{Q}^{W}, \theta_{\mathrm{t}} \sim \mathcal{Q}^{\theta_{\mathrm{t}}}, \\& \operatorname{Pr}\left(\bar{X}=[x_0, \ldots, x_N] \in \overline{\mathcal{X}}_j\right) \geq p_j \text { for all } j=1, \ldots, n_{c_x}, \\& \operatorname{Pr}\left(\bar{U}=[u_0, \ldots, u_{N-1}] \in \overline{\mathcal{U}}_j\right) \geq p_j \text { for all } j=1, \ldots, n_{c_u} \\&\end{aligned}

Introduction to MPC

ALT

Receding horizon controller aka MPC

at each timestep, solve a finite-horizon optimization problem [t, t+Np]
apply the optimal input in the [t, t+1] interval
at time t+1 solve the same problem over a shifted horizon
ALT
Classical controller
ALT
disturbance rejection d→ y
noise insensitivity n→ y
model uncertainty
usually in frequency domain
MPC
ALT
easy to include control constraints
easy to include state constraints
usually formulated in the time domain

Scenario based MPC

Scenario based MPC can be seen as a two stage stochastic program
at each time step, several scenarios of the disturbance are considered, so that we can take a sample approximation of EW\mathbb{E}_W
a unique control problem is solved
Jt=minuEW(t=0Tlt(xt,s,ut,s,wt,s))=minu1nss=1ns(t=0Tlt(xt,s,ut,s,wt,s))\begin{align} J_{\mathrm{t}}^*&=\min_{u} \mathbb{E}_W\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_{t,s}, u_{t,s}, w_{t,s})\right)\\ &= \min_{u} \frac{1}{n_s}\sum_{s=1}^{n_s}\left(\sum_{t=0}^{T} l_{\mathrm{t}}(x_{t,s}, u_{t,s}, w_{t,s})\right)\end{align}
a so called non-anticipativity constraint is added: this constraint make the actions for all scenarios at step 0 equal (u0,s)u_{0, s})
ALT

Tree-based MPC

Forecasting scenarios are condensed into a scenario tree
Actions are chosen in a non-anticipative causal fashion.
for example, u12u_1^2 is decided as a function of ϵ12\epsilon_1^2, but not of any of ϵ2i,iN[1,μ(3)]\epsilon_2^i, i \in \mathbb{N}_{[1, \mu(3)]}
non-anticipativity constraints can be encoded in the same way: the picture on the right is the schematics of equality constraints for a problem with 8 scenarios; at step we have only 2 nodes, each of which bounds 4 scenarios.
ALT
Left: a stochastic scenario tree obtained from the scenarios. Right: graph representation of the tree
ALT
ALT
ALT

Approach hybridation

There are some similarities between RL and MPC:
MPC can be seen as a model-based approach, since we have an approximate function modeling the state transition.
In both cases, the stage cost ll (reward in the RL literature), could not coincide with the actual cost, but could be an easier approximation (including temporal discounts).
RL can directly optimize the policy π\pi, a.k.a. policy function approximation, as opposed to value function approximation
system dynamics: a parametric or a non-parametric model f(xt,wt,θf)f(x_t, w_t, \theta_f) is learned from data and used to obtain the next state values xt+1x_{t+1}
Gaussian Process: the system state innovations are modeled as a learnable gaussian process: [X+ft(x,u)]N(0,[K+Iσw2Kx,uKx,uTk([x,u],[x,u])])\left[\begin{array}{c}X^{+} \\f_{\mathrm{t}}(x, u)\end{array}\right] \sim \mathcal{N}\left(0,\left[\begin{array}{cc}K+I \sigma_w^2 & K_{x, u} \\K_{x, u}^T & k([x, u],[x, u])\end{array}\right]\right)
Input-convex and invertible NN: at each timestep gradient descent is used to optimize the control, passing through a differentiable, leared model of the system
learning of control policy
Learning from a teacher: a NN is used to learn from previous solved instances of a control problem
Analytic policy gradient: a NN is trained to directly minimize the open-loop loss function, and provides a non-parametric policy π(xt,W)\pi(x_t, W)
the stage cost ll  is rewritten in terms of value functions; as it then usually dependent on high dimensional states and noise, the value function is then replaced by a machine-learning based model.