modeling is the top of analytics worth. It doesn’t give attention to what occurred, and even what will occur – it takes analytics additional by telling us what we must always do to vary what will occur. To harness this further prescriptive energy, nevertheless, we should tackle a further assumption…a causal assumption. The naive practitioner will not be conscious that transferring from predictive to prescriptive comes with the bags of this lurking assumption. I Googled ‘prescriptive analytics’ and searched the primary ten articles for the phrase ‘causal.’ To not my shock (however to my disappointment), I didn’t get a single hit. I loosened the specificity of my phrase search by attempting ‘assumption’ – this one did shock me, not a single hit both! It’s clear to me that that is an under-taught part of prescriptive modeling. Let’s repair that!
While you use prescriptive modeling, you make causal bets, whether or not you recognize it or not. And from what I’ve seen this can be a terribly under-emphasized level on the subject given its significance.
By the top of this text, you’ll have a transparent understanding of why prescriptive modeling has causal assumptions and how one can establish in case your mannequin/strategy meets them. We’ll get there by overlaying the subjects under:
- Temporary overview of prescriptive modeling
- Why does prescriptive modeling have a causal assumption?
- How do we all know if now we have met the causal assumption?
What’s Prescriptive Modeling?
Earlier than we get too far, I need to say that that is not an article on prescriptive analytics – there’s loads of details about that in different places. This portion will probably be a fast overview to function a refresher for readers who’re already not less than considerably accustomed to the subject.
There’s a broadly identified hierarchy of three analytics varieties: (1) descriptive analytics, (2) predictive analytics, and (3) prescriptive analytics.
Descriptive analytics appears to be like at attributes and qualities within the information. It calculates tendencies, averages, medians, commonplace deviations, and so on. Descriptive analytics doesn’t try to say something extra in regards to the information than is empirically observable. Typically, descriptive analytics are present in dashboards and studies. The worth it offers is in informing the person of the important thing statistics within the information.
Predictive analytics goes a step past descriptive analytics. As an alternative of summarizing information, predictive analytics finds relationships inside the information. It makes an attempt to separate the noise from the sign in these relationships to search out underlying, generalizable patterns. From these patterns, it may well make predictions on unseen information. It goes additional than descriptive analytics as a result of it offers insights on unseen information, somewhat than simply the information which might be instantly noticed.
Prescriptive analytics goes a further step past predictive analytics. Prescriptive analytics makes use of fashions created by means of predictive analytics to suggest good or optimum actions. Typically, prescriptive analytics will run simulations by means of predictive fashions and suggest the technique with essentially the most fascinating end result.
Let’s contemplate an instance to raised illustrate the distinction between predictive and prescriptive analytics. Think about you’re a information scientist at an organization that sells subscriptions to on-line publications. You’ve gotten developed a mannequin that predicts that chance {that a} buyer will cancel their subscription in a given month. The mannequin has a number of inputs, together with promotions despatched to the shopper. Thus far, you’ve solely engaged in predictive modeling. At some point, you get the intense concept that it’s best to enter completely different reductions into your predictive mannequin, observe the impression of the reductions on buyer churn, and suggest the reductions that finest steadiness the price of the low cost with the advantage of elevated buyer retention. Together with your shift in focus from prediction to intervention, you might have graduated to prescriptive analytics!
Beneath are examples of doable analyses for the shopper churn mannequin for every stage of analytics:
Now that we’ve been refreshed on the three sorts of analytics, let’s get into the causal assumption that’s distinctive to prescriptive analytics.
The Causal Assumption in Prescriptive Analytics
Shifting from predictive to prescriptive analytics feels intuitive and pure. You’ve gotten a mannequin that predicts an necessary end result utilizing options, a few of that are in your management. It is smart to then simulate manipulating these options to drive in the direction of a desired end result. What doesn’t really feel intuitive (not less than to a junior modeler) is that doing so strikes you right into a harmful house in case your mannequin hasn’t captured the causal relationships between the goal variable and the options you propose to vary.
We’ll first present the risks with a easy instance involving a rubber duck, leaves and a pool. We’ll then transfer on to real-world failures which have come from making causal bets once they weren’t warranted.
Leaves, a pool and a rubber duck
You take pleasure in spending time exterior close to your pool. As an astute observer of your surroundings, you discover that your favourite pool toy – a rubber duck – is often in the identical a part of the pool because the leaves that fall from a close-by tree.

Finally, you resolve that it’s time to clear the leaves out of the pool. There’s a particular nook of the pool that’s best to entry, and also you need the entire leaves to be in that space so you may extra simply acquire and discard them. Given the mannequin you might have created – the rubber duck is in the identical space because the leaves – you resolve that it might be very intelligent to maneuver the toy to the nook and watch in delight because the leaves comply with the duck. Then you’ll simply scoop them up and proceed with the remainder of the day, having fun with your newly cleaned pool.
You make the change and really feel like a idiot as you stand within the nook of the pool, proper over the rubber duck, web in hand, whereas the leaves stubbornly keep in place. You’ve gotten made the horrible mistake of utilizing prescriptive analytics when your mannequin doesn’t cross the causal assumption!

Perplexed, you look into the pool once more. You discover a slight disturbance within the water coming from the pool jets. You then resolve to rethink your predictive modeling strategy utilizing the angle of the jets to foretell the situation of the leaves as a substitute of the rubber duck. With this new mannequin, you estimate how it is advisable to configure the jets to get the leaves to your favourite nook. You progress the jets and this time you’re profitable! The leaves drift to the nook, you take away them and go on together with your day a wiser information scientist!
This can be a quirky instance, nevertheless it does illustrate a couple of factors properly. Let me name them out.
- The rubber duck is a basic ‘confounding’ variable. It’s also affected by the pool jets and has no impression on the situation of the leaves.
- Each the rubber duck and the pool jet fashions made correct predictions – if we merely wished to know the place the leaves had been, they may very well be equivalently good.
- What breaks the rubber duck mannequin has nothing to do with the mannequin itself and all the pieces to do with the way you used the mannequin. The causal assumption wasn’t warranted however you moved ahead anyway!
I hope you loved the whimsical instance – let’s transition to speaking about real-world examples.
Shark Tank Pitch
In case you haven’t seen it, Shark Tank is a present the place entrepreneurs pitch their enterprise thought to rich traders (known as ‘sharks’) with the hopes of securing funding cash.
I used to be just lately watching a Shark Tank re-run (as one does) – one of many pitches within the episode (Season 10, Episode 15) was for an organization known as GoalSetter. GoalSetter is an organization that enables dad and mom to open ‘mini’ financial institution accounts of their youngster’s title that household and pals could make deposits into. The thought is that as a substitute of giving toys or present playing cards to youngsters as presents, folks can provide deposit certificates and kids can save up for issues (‘objectives’) they need to buy.
I’ve no qualms with the enterprise thought, however within the presentation, the entrepreneur made this declare:
…youngsters who’ve financial savings accounts of their title are six occasions extra prone to go to varsity and 4 occasions extra prone to personal shares by the point they’re younger adults…
Assuming this statistic is true, this assertion, by itself, is all positive and properly. We will have a look at the information and see that there’s a relationship between a baby having a checking account of their title and going to varsity and/or investing (descriptive). We may even develop a mannequin that predicts if a baby will go to varsity or personal shares utilizing checking account of their title as a predictor (predictive). However this doesn’t inform us something about causation! The funding pitch has this refined prescriptive message – “give your child a GoalSetting account and they are going to be extra prone to go to varsity and personal shares.” Whereas semantically just like the quote above, these two statements are worlds aside! One is an announcement of statistical incontrovertible fact that depends on no assumptions, and the opposite is a prescriptive assertion that has a big causal assumption! I hope that confounding variable alarms are ringing in your head proper now. It appears a lot extra seemingly that issues like family earnings, monetary literacy of fogeys and cultural influences would have a relationship with each the chance of opening a checking account in a baby’s title and that youngster going to varsity. It doesn’t appear seemingly that giving a random child a checking account of their title will improve their probabilities of going to varsity. That is like transferring the duck within the pool and anticipating the leaves to comply with!
Studying Is Basic Program
Within the Sixties, there was a government-funded program known as ‘Studying is Basic (RIF).’ A part of this program targeted on placing books within the houses of low-income youngsters. The objective was to extend literacy in these households. The technique was partially primarily based on the concept that houses with extra books in them had extra literate youngsters. You may know the place I’m going with this one primarily based on the Shark Tank instance we simply mentioned. Observing that houses with a number of books have extra literate youngsters is descriptive. There’s nothing fallacious with that. However, whenever you begin making suggestions, you step out of descriptive house and leap into the prescriptive world – and as we’ve established, that comes with the causal assumption. Placing books in houses assumes that the books trigger the literacy! Analysis by Susan Neuman discovered that placing books in houses was not adequate in growing literacy with out further assets1.
In fact, giving books to youngsters who can’t afford them is an efficient factor – you don’t want a causal assumption to do good issues 😊. However, in case you have the particular objective of accelerating literacy, you’d be well-advised to evaluate the validity of the causal assumption behind your actions to appreciate your required outcomes!
How do we all know if we fulfill the causality assumption?
We’ve established that prescriptive modeling requires a causal assumption (a lot that you’re in all probability exhausted!). However how can we all know if the idea is met by our mannequin? When interested by causality and information, I discover it useful to separate my ideas between experimental and observational information. Let’s undergo how we will really feel good (or perhaps not less than ‘okay’) about causal assumptions with these two sorts of information.
Experimental Knowledge
In case you have entry to good experimental information on your prescriptive modeling, you’re very fortunate! Experimental information is the gold commonplace for establishing causal relationships. The small print of why that is the case are out of scope of this text, however I’ll say that the randomized task of remedies in a well-designed experiment offers with confounders, so that you don’t have to fret about them ruining your informal assumptions.
We will prepare predictive fashions on the output of a very good experiment – i.e., good experimental information. On this case, the data-generating course of meets causal identification circumstances between the goal variables and variables that had been randomly assigned remedies. I need to emphasize that solely variables which might be randomly assigned within the experiment will qualify for the causal declare on the idea of the experiment alone. The causal impact of different variables (known as covariates) might or will not be accurately captured. For instance, think about that we ran an experiment that randomly supplied a number of crops with numerous ranges of nitrogen, phosphorus and potassium and we measured the plant development. From this experimental information, we created the mannequin under:

As a result of nitrogen, phosphorus and potassium had been remedies that had been randomly assigned within the experiment, we will conclude that betas 1 by means of 3 estimate a causal relationship on plant development. Solar publicity was not randomly assigned which prevents us from claiming a causal relationship by means of the ability of experimental information. This isn’t to say {that a} causal declare will not be justified for covariates, however the declare would require further assumptions that we’ll cowl within the observational information part developing.
I’ve used the qualifier good when speaking about experimental information a number of occasions now. What’s a good experiment? I’ll go over two frequent points I’ve seen that stop an experiment from creating good information, however there’s much more that may go fallacious. You must learn up on experimental design if you want to go deeper.
Execution errors: This is likely one of the most typical points with experiments. I used to be as soon as assigned to a mission a couple of years in the past the place an experiment was run, however some information had been combined up relating to which topics acquired which remedies – the information was not usable! If there have been important execution errors chances are you’ll not be capable to draw legitimate causal conclusions from the experimental information.
Underpowered experiments: This could occur for a number of causes – for instance, there will not be sufficient sign coming from the therapy, or there might have been too few experimental models. Even with good execution, an underpowered research might fail to uncover actual results which may stop you from assembly the causal conclusion required for prescriptive modeling.
Observational Knowledge
Satisfying the causal assumption with observational information is far more tough, dangerous and controversial than with experimental information. The randomization that could be a key half in creating experimental information is highly effective as a result of it removes the issues brought on by all confounding variables – identified and unknown, noticed and unobserved. With observational information, we don’t have entry to this extraordinarily helpful energy.
Theoretically, if we will accurately management for all confounding variables, we will nonetheless make causal claims with observational information. Whereas some might disagree with this assertion, it’s broadly accepted in precept. The true problem lies within the utility.
To accurately management for a confounding variable, we have to (1) have high-quality information for the variable and (2) accurately mannequin the connection between the confounder and our goal variable. Doing this for every identified confounder is tough, nevertheless it isn’t the worst half. The worst half is which you could by no means know with certainty that you’ve accounted for all confounders. Even with robust area information, the chance that there’s an unknown confounder “on the market” stays. One of the best we will do is embody each confounder we will consider after which depend on what is named the ‘no unmeasured confounder’ assumption to estimate causal relationships.
Modeling with observational information can nonetheless add a whole lot of worth in prescriptive analytics, although we will by no means know with certainty that we accounted for all confounding variables. With observational information, I consider the causal assumption as being met in levels as a substitute of in a binary vogue. As we account for extra confounders, we seize the causal impact higher and higher. Even when we miss a couple of confounders, the mannequin should add worth. So long as the confounders don’t have too massive of an impression on the estimated causal relationships, we could possibly add extra worth making selections with a barely biased causal mannequin than utilizing the method we had earlier than we used prescriptive modeling (e.g., guidelines or intuition-based selections).
Having a realistic mindset with observational information might be necessary since (1) observational information is cheaper and far more frequent than experimental information and (2) if we depend on hermetic causal conclusions (which we will’t get with observational information), we could also be leaving worth on the desk by ruling out causal fashions which might be ‘adequate’, although not good. You and your enterprise companions must resolve the extent of leniency to have with assembly the causal assumption, a mannequin constructed on observational information may nonetheless add main worth!
Wrapping it up
Whereas prescriptive analytics is highly effective and has the potential so as to add a whole lot of worth, it depends on causal assumptions whereas descriptive and predictive analytics don’t. You will need to perceive and to fulfill the causal assumption in addition to doable.
Experimental information is the gold commonplace of estimating causal relationships. A mannequin constructed on good experimental information is in a powerful place to fulfill the causal assumptions required by prescriptive modeling.
Establishing causal relationships with observational information might be harder due to the potential of unknown or unobserved confounding variables. We should always steadiness rigor and pragmatism when utilizing observational information for prescriptive modeling – rigor to consider and try to manage for each confounder doable and pragmatism to grasp that whereas the causal results will not be completely captured, the mannequin might add extra worth than the present decision-making course of.
I hope that this text has helped you achieve a greater understanding of why prescriptive modeling depends on causal assumptions and how one can tackle assembly these assumptions. Comfortable modeling!
- Neuman, S. B. (2017). Principled Adversaries: Literacy Analysis for Political Motion. Lecturers Faculty File, 119(6), 1–32.

