A Light Introduction to Stochastic Programming

by root May 1, 2026

written by root May 1, 2026 0 comment 26 views

In my first TDS publish, I wrote about the best way to translate a real-world drawback into an integer linear program. In my second, I wrote about the best way to make that program sturdy in opposition to uncertainty. Each had been variations on the identical concept: take a fuzzy real-world query, squeeze it into an LP, and let a solver do the remainder.

There’s a second in each optimizer’s life, although, when the LP begins to really feel a bit too neat. Demand is a quantity. Journey time is a quantity. Wind pace is a quantity. The mannequin accepts the enter, returns an optimum answer, and goes on its method. The truth these numbers had been supposed to explain (messy, jittery, and sometimes shocking) doesn’t actually present up wherever.

Stochastic programming is the sector that takes that discomfort severely. As an alternative of pretending the info is actual, it builds the uncertainty straight into the mannequin. The value you pay is a little more notation; the payoff is selections that maintain up when the world doesn’t cooperate.

This publish is a delicate tour of the fundamentals. We’ll see why the plain strategy doesn’t work, stroll by means of the 4 commonplace methods to deal with uncertainty in a linear program, and end with a fast sanity verify on whether or not any of that is well worth the effort. There’s some math, but it surely’s the identical math you already know from LP, with one further image hooked up.

Start line: a vogue firm with a nasty crystal ball

To make this concrete, we’ll use the working instance from dr. Ruben van Beesten’s lectures (extra on that within the credit under). It goes like this.

You run a vogue firm that sells winter clothes in Germany. Manufacturing occurs in Bangladesh, which is affordable however gradual: the products take just a few weeks to reach. So within the fall, you must resolve how a lot to provide for the upcoming winter season.

Two methods this could go incorrect: produce too little, and also you lose gross sales; produce an excessive amount of, and also you’re caught with inventory you possibly can’t promote. The entire query is how a lot to provide now, and the reply is dependent upon one thing you don’t really know but: winter demand.

When you ignored the uncertainty for a second and pretended demand was a hard and fast quantity, you could possibly write down a vanilla LP:

Right here x is how a lot you produce, c is the unit manufacturing value, h is demand, and T is simply the id matrix (one unit produced satisfies one unit of demand). The constraint says: produce no less than as a lot as is demanded.

That is superb if h is definitely identified. The difficulty is that demand isn’t a quantity, it’s a random variable. Let’s name it ξ. The trustworthy model of the mannequin would appear to be this:

And right here we hit a wall. What does it imply for x to fulfill a constraint that is dependent upon a random variable? Is x = 100possible if demand would possibly be 80, would possibly be 120, and is likely to be wherever in between? The issue isn’t exhausting to resolve: it’s ill-defined. The solver doesn’t even know which drawback you’re asking it to resolve.

Stochastic programming is, in essence, a set of principled solutions to that query. We’ll take a look at the 4 commonest ones.

4 methods to deal with the uncertainty

Every of the 4 approaches takes the ill-defined LP above and turns it right into a well-defined optimization drawback. They differ in what they assume you already know concerning the uncertainty, and in how cautious they’re about dangerous outcomes.

1. Sturdy optimization: put together for the worst

Probably the most cautious strategy. You don’t have to know the total likelihood distribution of ξ, however solely its help, i.e., the set of values it might presumably take. We name this set the uncertainty set, written U. Then you definitely ask: what’s the greatest resolution that stays possible irrespective of which ξ ∈ U really exhibits up?

The constraint now has to carry for each ξ within the uncertainty set. In our vogue instance with U = [0, 10], you’d be planning for demand of 10, the worst case, each time.

That’s the power and the weak point of sturdy optimization in a single sentence. The answer is bulletproof, but it surely’s additionally conservative: you’ll typically be sitting on stock you didn’t want, since you deliberate as if the unlikely worst case had been assured. When you’ve learn my earlier post on robustifying linear programs, that is precisely the framework that sits behind these 4 steps.

2. Probability constraints: calm down the worst case

Sturdy optimization plans for any potential end result. Probability constraints calm down that to: plan for most of them. You choose a likelihood degree α, say 95%, and require the constraint to carry with no less than that likelihood:

That is referred to as a joint likelihood constraint: all of the entries of the constraint vector should be glad concurrently, with joint likelihood ≥ α. A weaker variant treats every row individually:

These are particular person likelihood constraints: every constraint i should maintain with likelihood no less than αᵢ, however you don’t care concerning the joint occasion. Fast train: in case you set each αᵢ equal to the joint α, which formulation is extra conservative?

Reply: the joint model. Satisfying all constraints concurrently is a stricter requirement than satisfying every one in isolation, so the joint formulation has a smaller possible area and a worse (increased) optimum value. Both method, likelihood constraints offer you a knob, α, to dial how cautious you need to be. Crank it to 1, and also you’re again to (nearly) sturdy. Drop it to 0.5, and also you’re principally flipping a coin on feasibility. Most actual purposes stay someplace within the 0.9–0.99 vary.

There’s a catch value flagging: likelihood constraints are exhausting typically. The likelihood time period contained in the constraint is a non-linear, typically non-convex operate of x, so that you normally can’t hand the formulation on to an ordinary LP solver. There are tractable particular instances (Gaussian noise, sure mixtures of distributions, sample-based approximations), however the basic drawback is tougher than it appears at first look.

3. Two-stage recourse fashions: resolve, observe, appropriate

The primary two approaches deal with constraint violation as one thing to keep away from, both at all times (sturdy) or with excessive likelihood (likelihood). Generally that’s the incorrect body. In our vogue instance, falling wanting demand isn’t catastrophic. It’s annoying. You’ll be able to normally repair it: produce a small emergency batch in Germany at a better value, or ship by air, or simply settle for the misplaced gross sales and transfer on.

This concept, that violating a constraint isn’t the tip of the world, you possibly can take a corrective motion later, is the guts of recourse fashions. Within the two-stage model, the timeline appears like this:

Stage 1 (now): you make a first-stage resolution x whereas ξ remains to be unsure.
Then: ξ is realized, i.e., the random variable turns into a identified quantity.
Stage 2 (later): you make a second-stage resolution y, understanding ξ.

Mathematically, the primary stage appears nearly like a vanilla LP, besides the target now accommodates an anticipated future value:

The operate v(ξ, x) is the optimum worth of the second-stage drawback, given that you just selected x within the first stage and that ξ turned out to be the realized worth:

Learn this rigorously. The fitting-hand aspect, h(ξ) − T(ξ) x, is the shortfall, how a lot your first-stage resolution didn’t cowl, after ξ was revealed. The recourse resolution y then closes that hole, at a value q(ξ)ᵀ y. So the construction is: pay the up-front value cᵀ x, and on high of it pay the anticipated value of cleansing up after the random variable does its factor.

That’s the entire concept. Two-stage recourse fashions are by far the commonest formulation in observe, partly as a result of they seize the precise chronology of selections in lots of actual issues (manufacturing planning, stock, vitality dispatch, scheduling), and partly as a result of they’re comparatively well-behaved mathematically.

A few items of vocabulary you’ll journey over in case you learn additional:

A mannequin has mounted recourse if the recourse matrix W doesn’t rely on ξ. Many algorithms solely work on this case.
A mannequin has (comparatively) full recourse if there’s at all times a possible recourse resolution y, it doesn’t matter what ξ seems to be and it doesn’t matter what x you selected. If full recourse fails, the second-stage drawback might be infeasible, which turns into an implicit constraint on the primary stage. (That is precisely the place Benders’ feasibility cuts come from, however that’s a narrative for an additional publish.)

4. Multi-stage recourse fashions: preserve going

Generally life isn’t two levels. You don’t simply decide-observe-correct as soon as and go house; you resolve, observe, resolve, observe, resolve, … time and again. Multi-stage recourse fashions are the pure extension.

In our vogue instance, suppose we’re not selecting as soon as within the fall, however thrice: within the fall (low-cost, in Bangladesh), in early winter (dearer, in Romania), and in late winter (most costly, in Germany). Demand is progressively revealed over the season, and at every stage we resolve primarily based on what we’ve noticed thus far.

The notation will get heavier, you find yourself writing recursive worth capabilities Qₜ, with histories ξ_[t] = (ξ₁, …, ξₜ) hanging off them, however conceptually nothing new is occurring. Every stage is a recourse drawback nested contained in the earlier one. The pure option to image that is as a situation tree: every node is a state of the world, every department is a potential realization of the subsequent random variable, and a situation is an entire root-to-leaf path.

Instance of a three-stage situation tree, supply: course slides by dr. Ruben van Beesten.

One subtlety. A situation is your complete trajectory of ξ, not only one realization. Understanding that ξ₂ = 10 doesn’t inform you which situation you’re in, as a result of ξ₃ hasn’t occurred but. This issues if you begin writing the deterministic equal (subsequent part), as a result of you must watch out that your selections solely rely on data that has really been noticed by the point the choice is made. That property is named non-anticipativity: you possibly can’t anticipate the long run. The mannequin would fortunately cheat in case you didn’t implement it explicitly.

How will we really resolve a recourse mannequin?

To date we’ve been writing fashions. To resolve them, we sometimes remodel them into one thing an ordinary LP solver can chew on. The trick is the deterministic equal formulation.

Suppose the random variable ξ has a discrete distribution: it takes finitely many values ξ¹, ξ², …, ξˢ (referred to as eventualities), every with likelihood pₛ. Then the anticipated second-stage value is only a finite sum, and we are able to write your complete two-stage drawback as one huge LP by introducing one copy of y per situation:

That’s an everyday LP. Huge, presumably very huge, in case you have S eventualities, you’ve basically copied the second stage S instances, but it surely’s an LP. You’ll be able to hand it straight to HiGHS, Gurobi, CPLEX, or no matter solver you want, and it’ll resolve it.

Two pure questions comply with.

First: what if the distribution of ξ is not discrete? In that case the deterministic equal has infinitely many eventualities and isn’t finite-dimensional. The usual repair is pattern common approximation: draw a pattern of dimension S from the true distribution, resolve the sampled deterministic equal, and let S develop till your answer stabilizes statistically. There’s an entire literature on how huge S must be and what ensures you get.

Second: what if the deterministic equal is simply too huge to resolve straight? That is the place decomposition strategies are available in. Benders’ decomposition splits the issue right into a grasp drawback within the first-stage variables and a subproblem per situation, then iteratively passes data between them. For multi-stage fashions with many levels, the analogous trick is stochastic twin dynamic programming (SDDP), which makes use of sampling and approximate worth capabilities to keep away from constructing the total situation tree. Each are superior sufficient to deserve their very own posts, so I’ll come again to them later.

Is any of this really well worth the hassle?

Trustworthy query. Stochastic packages are messier to formulate, tougher to resolve, and slower to run than their deterministic cousins. In case your real-world drawback isn’t very delicate to uncertainty, you is likely to be higher off simply plugging the anticipated demand into an everyday LP and calling it a day.

The excellent news is, you possibly can quantify precisely how a lot the stochastic formulation buys you. There are two classical metrics, and each are value understanding.

Outline 4 numbers:

In phrases: SP is the optimum worth of the particular stochastic program. EV is what you get in case you change ξ with its anticipated worth and resolve the ensuing deterministic drawback; name its answer x̄. EEV is the anticipated value of implementing that deterministic answer x̄ within the precise stochastic world. And WS (“wait-and-see”) is the anticipated value in case you acquired to peek on the realized ξ earlier than deciding x, the cheating-but-best case.

From these 4 numbers you possibly can construct two extremely informative portions:

VSS is the Worth of the Stochastic Answer: how a lot worse off you’d be in case you simply solved the deterministic drawback with common values and applied its answer. If VSS is small, the stochastic program isn’t shopping for you a lot; the deterministic shortcut is okay.

EVPI is the Anticipated Worth of Excellent Data: how a lot you’d achieve if a benevolent oracle handed you the realized ξ earlier than you needed to resolve. If EVPI is small, your forecasts already comprise a lot of the data you want; investing in higher predictions in all probability gained’t transfer the needle. If EVPI is massive, higher information has actual worth.

Clarification of helpful metrics for a stochastic program.

The 2 metrics experience alongside on a tidy chain of inequalities (assuming uncertainty solely on the right-hand aspect):

Learn it left to proper: cheating-with-the-mean (EV) is at most as dangerous as cheating-with-the-realization (WS), which is at most as dangerous because the trustworthy stochastic reply (SP), which is at most as dangerous as plugging within the deterministic-solution-and-living-with-it (EEV). The chain implies a free higher certain on VSS that you may compute earlier than you ever resolve the SP: VSS ≤ EEV − EV. If that hole is tiny, the deterministic shortcut is nice sufficient and it can save you your self the headache.

The place to go from right here

This publish caught to the fundamentals: the best way to write a stochastic program down. The subsequent pure step is the best way to resolve massive ones effectively. The 2 huge workhorses are:

Benders’ decomposition — for two-stage fashions, decomposes the deterministic equal right into a grasp drawback (in x) plus one subproblem per situation, and reconciles them with cuts. Notably elegant when you’ve a lot of eventualities however a comparatively small first stage.
Stochastic Twin Dynamic Programming (SDDP) — for multi-stage fashions, makes use of sampling and piecewise-linear approximations of the long run worth capabilities. Famously utilized in hydropower scheduling, the place the situation tree is so huge that specific enumeration is hopeless.

Each deserve their very own posts. If there’s curiosity, I’ll write them up.

Takeaway

When you’re utilizing LPs in any context the place the enter information is genuinely unsure because of forecasted demand, climate, costs, journey instances, or anything, then your mannequin is making an implicit alternative about the best way to deal with that uncertainty. “Simply use the imply” is a alternative. So is “plan for the worst.” Stochastic programming offers you the vocabulary to make that alternative specific, and the instruments to guage whether or not your alternative was one (hiya, VSS).

To summarize the 4 primary methods to mannequin uncertainty in an LP:

Sturdy optimization — plan for the worst case in a given uncertainty set.
Probability constraints — require feasibility with no less than likelihood α.
Two-stage recourse — resolve, observe, appropriate; pay an anticipated recourse value.
Multi-stage recourse — the identical concept, repeated over time on a situation tree.

And two metrics value holding in your again pocket: VSS (does the stochastic mannequin assist?) and EVPI (would higher forecasts assist?).

Most actual issues aren’t deterministic. The excellent news is your modeling toolkit doesn’t should be both.

Credit and references

This publish is predicated on lectures by dr. Ruben van Beesten (Norwegian College of Science and Expertise) from his course on Stochastic Programming given in October 2023, which I had the pleasure of attending in Trondheim, Norway. The style-company instance, the four-way taxonomy of formulations, and the VSS/EVPI framing all come straight from his slides; any clumsiness within the retelling is mine.

The unique modeling train that motivates a lot of the recourse-model instinct is from

Higle, J. L. (2005). Stochastic Programming: Optimization When Uncertainty Issues. In INFORMS TutORials in Operations Analysis, pp. 30–53.

A few additional pointers value understanding about:

Kleywegt, A. J., Shapiro, A., and Homem-de-Mello, T. (2002). The pattern common approximation technique for stochastic discrete optimization. SIAM Journal on Optimization, 12(2), 479–502. The usual reference for SAA.
Higle, J. L., and Sen, S. (1991). Stochastic decomposition: an algorithm for two-stage linear packages with recourse. Arithmetic of Operations Analysis, 16(3), 650–669. One of many few strategies that handles non-discrete distributions straight.

And naturally, the 2 earlier posts on this collection: 5 questions that may show you how to mannequin integer linear packages higher and 4 steps to robustify your linear program.

Welcome to Ivugangingo!

At Ivugangingo, we're passionate about delivering insightful content that empowers and informs our readers across a spectrum of crucial topics. Whether you're delving into the world of insurance, navigating the complexities of cryptocurrency, or seeking wellness tips in health and fitness, we've got you covered.

A Light Introduction to Stochastic Programming

Start line: a vogue firm with a nasty crystal ball

4 methods to deal with the uncertainty

1. Sturdy optimization: put together for the worst

2. Probability constraints: calm down the worst case

3. Two-stage recourse fashions: resolve, observe, appropriate

4. Multi-stage recourse fashions: preserve going

How will we really resolve a recourse mannequin?

Is any of this really well worth the hassle?

The place to go from right here

Takeaway

Credit and references

Automobile upkeep prices in 2026: What automobile house owners must know

Dyson Promo Code: 25% Off Might 2026

Converter

Editors Pick

Newsletter

Categories

Related Posts