The Mathematical Foundations of Goal-Based Investing
Discover the mathematical foundations behind GBI's investment engine: from Markov control models to Bellman's Principle of Optimality. Learn how rigorous dynamic programming theory translates directly into optimal portfolio strategies.
The goal of Goal-Based Wealth Management (GBWM) is simple to state but powerful in practice: given a fixed investment horizon (say, ten years) and a target wealth level (the amount you need to reach your goal), find the best possible portfolio strategy — one that adapts over time as markets evolve — to maximize the probability of getting there. Formally, we look for:
Dynamic Programming Theory
To solve this problem rigorously, we draw on the mathematical theory of dynamic programming. This section lays out the core theory in an abstract, general form — independent of investing for now. Think of it as building the formal scaffolding before furnishing the room. We will apply it directly to our GBWM problem in the next section. We begin with a few key definitions.
Definition 1. A Markov control model is a five-tuple
consisting of:
a) a Borel space , called the state space with elements referred as states;
b) a Borel space , called the control or action set;
c) a family of nonempty measurable subsets of , with representing the set of feasible controls or actions when the system is in state . The set of feasible state-action pairs
is assumed to be a measurable subset of ;
d) a transition law ;
e) a measurable function called the one-stage reward function.
In plain terms, a Markov control model is a formal way of describing any sequential decision-making problem. It specifies: where the system can be (the state space ), what decisions are available (the action set ), which decisions are feasible from each state (), how the system evolves in response to a decision (the transition law ), and what you gain along the way (the reward function ).
Definition 2. A control policy is a sequence of -measurable random variables . If there exists a sequence of measurable functions such that: , then the policy is said to be a deterministic Markov policy.
Intuitively, a control policy is simply a decision rule — it tells you which action to take at each point in time. The most tractable kind is a deterministic Markov policy: at each step, it maps the current state directly to an action using a fixed function , with no randomness in the decision itself.
Given a Markov control model, our objective is to maximize the total expected reward accumulated over the entire horizon . This is captured by the criterion function:
where is the terminal reward function. Here, is the total expected reward when starting in state and following policy : the sum collects rewards at each intermediate step, and is the reward received at the final period.
We call the value function — it records the best expected reward achievable from state , optimized over all feasible policies in
Our goal is then to find an optimal policy
Under specific technical conditions, the following theorem guarantees that an optimal policy exists — and, crucially, tells us exactly how to find it by working backward through time.
Theorem 1. Let
Suppose that these functions are measurable and that, for each
Then the (deterministic Markov) policy
This is the mathematical statement of backward induction. Rather than searching over all possible strategies simultaneously — an overwhelming task — we solve the problem one period at a time, starting from the end. At each step
For a proof of Theorem 1, refer to Hernandez-Lasserre (1996), Section 3.2.
To build intuition for this result, consider the reward-to-go — the total expected reward you can still collect from time
It is possible to prove that
That is,
There is also a complementary result that fully characterizes when a policy is optimal. A policy
This result is known as Bellman's Principle of Optimality (see Bellman (1957) for a full description and proof). Intuitively, it means that a globally optimal strategy must also be locally optimal at every step — there are no beneficial 'short-term sacrifices.' If a policy is ever suboptimal from some state onward, it cannot be globally optimal.
Before applying this theory to our problem, we should address one important technical requirement embedded in Theorem 1: at each backward step, a maximizing action
Assumption 1. The Markov control model and a given measurable function
is measurable and there exists a measurable function
In other words, Assumption 1 guarantees that the 'best action' at each state is not merely approached but is actually achieved by some well-defined function
Hernandez-Lasserre (1996), Section 3.3, provides three sufficient sets of conditions under which Assumption 1 holds. The one most relevant to our application is the following.
Theorem 2. With the same notation as above, the following set of conditions
a)
b) the one-stage reward
c) the transition law
is bounded and continuous on
implies Assumption 1 for any non-negative measurable function
In plain terms, Theorem 2 says: if the set of available actions is compact and varies continuously with the state (a), the reward function is well-behaved — continuous and bounded (b), and the transition law responds smoothly to changes in the state and action, in the sense that small changes in inputs produce small changes in expected outcomes (the Feller condition in c) — then the backward induction algorithm is guaranteed to work.
GBWM as a Dynamic Programming Problem
With the theoretical scaffolding now in place, we can map our investing problem directly onto the Markov control model framework. Using the same terminology introduced above:
a) The state space
b) The action set
c) The family
d) The transition law
The state space, action set, and transition law all map naturally onto our framework. The reward function
There are two differences between the standard dynamic programming setup and our problem. First, dynamic programming typically maximizes an expected value, whereas we want to maximize a probability. Second, our problem has a single reward only at the final date — reaching the goal — whereas the standard formulation allows for rewards at intermediate steps too (a variation we also use on our platform for goals with periodic payouts).
To bridge both gaps, we set the reward function as follows:
This means the reward is 1 if the final wealth reaches the goal
we see that maximizing the expected value of this indicator function is exactly the same as maximizing the probability of reaching the goal — which resolves the first difference.
With this reward function in place, conditions (a) and (c) of Theorem 2 are straightforward to verify in our setting. Condition (b) is more delicate: because the reward is an indicator function — jumping from 0 to 1 at the threshold
We will not go into the technical details here, but it is possible to regularize the reward function in a way that recovers continuity, allowing condition (b) to be satisfied and the full framework to apply.
Bibliography
Hernandez-Lasserre (1996): Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov control processes, volume 30 of Applications of Mathematics (New York). Springer-Verlag, New York. Basic optimality criteria.
Bellman (1957): Bellman, R. (1957). Dynamic programming. Princeton Landmarks in Mathematics. Princeton University Press, Princeton, NJ. Reprint of the 1957 edition, with a new introduction by Stuart Dreyfus.
Want to see this theory in action? Try our tool and discover how dynamic programming algorithms power truly optimal investment advice.