Simulation Horizons, Constraints, & the Emergence of Strategic Agency
Most systems do not play games. They respond to gradients.
The distinction matters because economics and biology both reach for optimization language to compress behavior into a single object. But optimization hides a structural split: some systems move toward better conditions, while others change the conditions themselves. A migrating herd follows food gradients. A central bank reshapes the constraint topology. Both can be described as “optimizing,” yet only one exhibits strategic agency in any meaningful sense.
This note sketches a boundary condition for that transition. The claim is not that game theory is wrong—it is that game theory is conditional. A system can only behave strategically when it can afford to model counterfactual futures and has enough authority to convert those models into world-editing actions. Below that threshold, what looks like optimization is often just motion along a constraint field.
Academically, the same boundary shows up in several nearby literatures. In control, it is the distinction between receding-horizon control (model predictive control) and myopic or saturated feedback: strategy requires a long enough lookahead to justify costly intervention.1 In economics, it aligns with bounded rationality (limited feasible computation) and rational inattention (limited feasible information processing), both of which turn foresight into a resource allocation problem.23 In networked systems, “authority” corresponds to whether interventions can actually reach the relevant degrees of freedom—captured in the language of controllability and structural reachability.4
The Gradient–Game Distinction
Gradient-following and strategic play differ in kind, not degree.
Gradient-following behavior minimizes immediate energetic cost without counterfactual reasoning: relocate, reallocate, avoid heat, chase food, reduce exposure, deleverage. The system responds to the local slope of a payoff or free-energy function. Strategic play incurs present cost to alter future payoffs: build a moat, change rules, coordinate a coalition, impose a standard, backstop a market, raise switching costs. The system models futures, compares trajectories, and intervenes.
Consider a system embedded in an environment with state and a local payoff function . Many biological and physical systems evolve according to , or its discrete analogue. Behavior follows local gradients. These systems do not represent counterfactual futures, do not model other agents’ delayed responses, and do not accept present cost for distant gain. They are adaptive but non-strategic.
From a control perspective, such systems are open-loop or weakly closed-loop. They minimize immediate cost subject to feasibility constraints. Migration through state space dominates intervention on state space. Most biological life operates in this regime.
This is the same split that appears in reflexive markets: most participants migrate along liquidity and narrative gradients, while a smaller class of actors—central banks, regulators, dominant platforms—can intervene and reshape the constraints themselves. The core theme of Bounded Reflexivity & Constraint Theory is precisely this asymmetry between those who move within the field and those who reshape it.
The Simulation Threshold
The boundary can be stated cleanly: an agent plays a game only when it can evaluate counterfactuals far enough ahead that intervention dominates migration.
Two quantities do most of the work. The effective controllable horizon measures how far into the future the agent can simulate and act on a plan. The environment timescale measures how quickly relevant payoffs and constraints evolve, including other agents’ responses.
Define a dimensionless horizon adequacy parameter,
When , the future is effectively opaque. Intervention cannot be justified reliably, and myopic descent dominates. When , counterfactual evaluation becomes feasible, and present sacrifice can be traded for future payoff-shape change.
This is not a moral distinction—smart versus dumb, rational versus irrational. It is a constraint distinction. When collapses through stress, bandwidth loss, or depleted slack, systems revert to migration even if they were strategic the day before.
Empirical operationalization
Operationalizing the threshold means choosing observables for , , and authority, then asking a timing question: when do the horizon and control conditions shift enough that intervention becomes the dominant mode (or collapses back into migration)? In practice, is set by the timescale of payoff/constraint change, by the agent’s decision-and-actuation cycle, and authority by the geometry of the feasible action set and its reach through the system.
- Proxies for : regime-switch timescales in volatility/correlation; liquidity regime half-lives; policy/constraint update cadence; characteristic time-to-forced-action in plumbing (margin/collateral cycles, settlement timelines).
- Proxies for : decision cycle time ; time-to-build and time-to-exit for positions/projects; latency to deploy capital or policy (funding access, approval chains); effective lookahead length implied by planning artifacts (budgets, hedging horizons, inventory cycles).
- Proxies for authority: balance-sheet slack and constraint capital; mandate scope; rule-setting power; actuator saturation indicators; network position for reach (critical nodes, chokepoints) as a practical proxy for structural reachability.4
- Testable signature: a rising gap between environment acceleration and controllable horizon— compressing toward 1 from above—followed by a switch from smooth policy/position adjustment to constraint-driven discontinuities (forced deleveraging, default choices, hard rationing).
These measurements matter because they let you mark the regime boundary empirically: the same system can exhibit “strategic” behavior in a slack, slow environment and “gradient” behavior hours later when constraint timelines shorten and admissible actions saturate.
Thermodynamics of Horizon
Simulation is not free. It is computation. Computation dissipates. Even in idealized form, there is a floor.
Landauer’s bound gives a minimum energy per erased bit5,
Actual systems operate far above Landauer efficiency, but the bound remains a reference point: it reminds us that thinking is an energetic allocation problem, not a purely informational one. This is the thermodynamic thread running through The Anthropic Thermodynamic Principle: agency lives in a narrow overhead window where complexity is sufficient yet slack remains.
The practical abstraction separates maintenance from slack. Let total available power be and let represent the fraction consumed by maintenance and overhead. Slack power is . Not all slack is usable for planning—a recurring structure/capacity split, roughly “structure” versus “degrees of freedom,” describes what fraction of slack is actually available for choice rather than absorbed by keeping the system coherent.
Define an available choice power budget,
where is the usable fraction of slack after structure and coordination overhead is accounted for. In real organizations under load, is often much less than 1.
If one decision cycle has duration , the energy available for planning in that cycle is,
Now attach an imperfect but operational cost model for simulation. Suppose a horizon rollout of length uses effective bit-operations per step—state, model, counterfactual bookkeeping—and costs energy per bit-operation,
Feasibility requires , yielding an upper bound on the simulation horizon,
where captures inefficiency above the Landauer limit.
This makes the qualitative claim quantitative: the planning horizon collapses when rises (stress), when falls (resource loss), when rises (model complexity), or when rises (inefficiency, friction, coordination cost). The statement that agents are rational within the horizon they can simulate becomes a physical constraint, not a psychological observation.
Horizon Without Actuation Is Still Migration
Horizon alone does not create strategy. An agent can simulate futures and still be forced to migrate if it cannot materially actuate the environment.
This is where the reflexivity framing matters. In constrained systems, there are actors with qualitatively different control sets. In markets, households migrate; central banks intervene. In platforms, users migrate; platforms change the incentive landscape. In organizations, teams migrate; leadership can change the constraint topology—budget, mandate, process, hiring capacity.
Model this as a constrained control problem,
where encodes admissible actions: capital, mandates, legal permissions, technical levers. A crude proxy for authority is the size and leverage of this feasible action set, though in real systems geometry matters more than volume.
In linearized form around a trajectory,
authority becomes the combination of how much input magnitude is permitted (constraint set) and whether the system is controllable through (structural reachability)6. Two agents can have the same horizon and wildly different ability to move the system because their and differ.
The strategic regime requires a joint condition: (horizon adequate) and authority nontrivial (intervention feasible and reaching the relevant degrees of freedom). Otherwise the system may be smart but still trapped into gradient descent.
Equilibrium as Stabilized Trajectory
Once horizons are finite and authority is uneven, equilibrium should be treated carefully. The equilibrium you see in the world is often maintained, not chosen.
Classically, an equilibrium is a fixed point of a best-response mapping or a dynamical system. But in constrained, reflexive environments the more common object is a stabilized trajectory: a regime that persists because deviations are punished, horizons are too short to justify exit, or authority actively stabilizes the current basin.
In control language, if a closed-loop policy induces local feedback gain , then local deviations evolve like,
Stability is a spectral condition: the closed-loop matrix must have spectral radius less than 1 in discrete time, or eigenvalues with negative real parts in continuous time. But crucially, bounded reflexivity shows up in the search for (horizon-limited planning), while authority shows up in admissible and effective .
Equilibrium becomes an emergent property of constrained feedback rather than an abstract solution concept. This is the bridge between the market-level story in Bounded Reflexivity & Constraint Theory and the agency story in Complex Adaptable Systems, Complexity Ladders, & Agency: stability is what systems do when they can afford it.
Stress Collapses Games into Gradients
The simplest prediction of this framework is qualitative: under stress, systems revert to gradient-following behavior because horizons shorten.
In the energy model above, is increasing in and decreasing in . Any stressor that raises maintenance overhead or reduces slack shrinks the feasible planning horizon. As falls below , drops and strategic behavior becomes less descriptive.
This pattern appears consistently in financial panics. Decision cycles speed up ( shrinks). Uncertainty inflates model complexity ( rises). Funding constraints raise effective overhead ( rises). Constraints bind faster than agents can re-coordinate.
The outcome looks like irrationality, but it is often rational under a collapsed horizon: migration dominates intervention because intervention can no longer be reliably evaluated. Panic is not a failure of reasoning but reasoning operating under a collapsed .
Formally, stress raises effective overhead , which reduces choice power,
Therefore shrinks,
Once falls below the environment’s strategic timescale, the system crosses back into the gradient-following regime. This is the mathematical statement of herding, forced deleveraging, flight-to-safety, and institutional default choice behavior—all of which are rational under a collapsed horizon.
A Toy Model: Leveraged Crowded Trade
To see these dynamics concretely, consider a minimal model of leveraged trading with endogenous price impact, margin constraints, finite simulation horizon, and a controller that can reshape constraints. Formally, each agent is solving a constrained receding-horizon (MPC) problem: optimize a horizon- objective under state dynamics and margin feasibility, implement the first step, then roll the horizon forward.1
In discrete time, there is one risky asset with price and two strategic agents holding positions . Each agent has equity . A controller—clearinghouse, prime broker, central bank proxy—sets a margin requirement each period.
Price formation follows linear impact on net order flow. Let be agent ‘s trade and be exogenous noise order flow. Then,
with the impact sensitivity. This is the core reflexive coupling: trades move price, which feeds back into wealth and constraints.
Equity updates via mark-to-market,
The maintenance margin constraint requires that equity cover a fraction of position value,
When violated, the agent faces required liquidation . The position must be reduced by at least —forced selling if long, forced buying if short. That is the mechanical gradient shove.
Each agent solves a horizon-limited control problem. With finite lookahead , agent minimizes an objective of the form
subject to price and wealth dynamics, margin feasibility, and forced liquidation. Here captures trading frictions and impact, while is a risk term (drawdown, variance, or constraint-proximity penalty) that becomes dominant as constraints tighten.
Now tie horizon to feasibility. Define a per-step simulation cost and a per-period choice budget . The constraint yields . When the system is stressed, decreases (less slack, more time on survival and collateral management) or increases (state space becomes more complex, volatility rises, more branches to simulate). Either way, effective horizon collapses.
Two regimes appear immediately. In the strategic regime, both agents have large enough and constraints are not binding. They behave like strategic optimizers, accepting short-term drawdown to preserve future payoff. Optimal controls come from the MPC problem, producing smooth . Price evolves with moderate impact; no cascade.
In the gradient regime, constraints bind and/or horizons collapse. Forced liquidation imposes with sign determined by constraint violation. Trades become reactive to the constraint gradient rather than to future payoff. If , any feasible policy must satisfy , and the price impact term induces a feedback loop: forced selling pushes price down, which pushes wealth down, which tightens margin, which forces more selling. That is the cascade.
Control authority enters as constraint-topology shaping. The controller chooses . A procyclical policy—raising margins when volatility rises, —increases exactly when prices are moving, amplifying forced selling. A countercyclical policy—lowering margins in stress or injecting equity—reshapes the constraint set to prevent from turning positive system-wide.
This is the formal statement of the central claim: controllers determine whether the system damps or violently exits when constraints bind. Strategy requires horizon. Horizon is a resource. Stress collapses horizon. When horizon collapses, behavior reverts to gradients. Controllers shape whether that reversion becomes a violent cascade.
Domain of Validity for Game Theory
Game theory is powerful precisely when the strategic regime holds: horizons are long relative to environment change, authority is sufficient to make commitments credible, and coordination bandwidth exists to maintain common knowledge.
When those conditions fail, game theory does not become false—it becomes incomplete. Other lenses dominate: gradient dynamics (flows in constraint fields), evolutionary and stochastic selection, control saturation and constraint binding, regime shifts driven by topology (who can act) rather than preferences (what they want).
This explains why game theory often feels least predictive in crises. Crises are the moments when is collapsing and is shrinking. The lens designed for strategic equilibrium loses resolution precisely when it matters most.
Game theory implicitly assumes sufficient simulation horizon, negligible cost of reasoning, and static constraints. This framework yields a precise boundary: game theory applies only above the simulation threshold. Below it, agents do not play games, equilibria are flow states, and control collapses into feasibility gradients. Under stress, markets revert to biology.
Implications
Four consequences follow from the constraint-horizon structure.
Bounded rationality is energetic, not psychological. Rationality is limited by feasible simulation, not intelligence. The planning horizon collapses when overhead rises or slack falls, not when agents become irrational. This reframes bounded rationality as a control limitation operating at thermodynamic boundaries.
Equilibrium stability depends on authority. Controllers determine whether systems damp or cascade. The same stress that collapses individual horizons can be offset by controllers with longer horizons and larger control sets. Central banks, clearinghouses, and platforms do not migrate when stress rises—they intervene. Whether equilibria persist or collapse depends on the distribution of control authority, not on aggregate preferences.
Intervention replaces migration at higher complexity. This is the defining feature of agency. As systems cross the simulation threshold, they stop following gradients and start reshaping them. The transition is sharp and structural, not gradual and behavioral.
Crisis is horizon collapse. Panic, herding, flight-to-safety—these are not failures of rationality but rational responses under collapsed . Forced liquidation, margin calls, default choices—all reflect the system crossing from the strategic regime into gradient descent. Game-theoretic explanations lose power precisely because the system has exited their domain of validity.
Closing Note
This piece is intended to sit alongside Bounded Reflexivity & Constraint Theory (the market substrate: constraints and feedback) and Complex Adaptable Systems, Complexity Ladders, & Agency (the thermodynamic feasibility window for agency) as a third foundation: when does optimization become strategy at all?
The useful formal object is not rationality but —horizon adequacy—paired with an authority measure. Where both are high, game theory explains a lot. Where either collapses, the system looks less like a game and more like a field.
Systems reveal their nature through how they respond to pressure. When they migrate, they are following gradients. When they intervene, they have crossed into agency. Bounded reflexivity governs how far into the future agents can see. Control authority governs how much of the environment they can change. Equilibrium is not a solution to an optimization problem—it is a temporarily stabilized trajectory of a constrained feedback system.
When horizons collapse, games dissolve. What remains is motion.
Footnotes
-
Rawlings, J. B., & Mayne, D. Q. (2009). Model Predictive Control: Theory and Design. Madison, WI: Nob Hill Publishing. ↩ ↩2
-
Simon, H. A. (1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1), 99–118. ↩
-
Sims, C. A. (2003). Implications of rational inattention. Journal of Monetary Economics, 50(3), 665–690. ↩
-
Lin, C.-T. (1974). Structural controllability. IEEE Transactions on Automatic Control, 19(3), 201–208. ↩ ↩2
-
Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3), 183–191. ↩
-
Kalman, R. E. (1960). On the general theory of control systems. Proceedings of the First International Conference on Automatic Control. ↩