Control Entropy: A Thermodynamic Bound on Human Agency in Human-AI Systems PRE-G

Abstract

Existing AI safety research treats human control capacity as an implicit constant rather than a quantified dynamical variable. We introduce control entropy , defined as the conditional Shannon entropy of an AI system's state given the human operator's control actions (), as an information-theoretic measure of residual human agency in human-AI coupled systems. We prove the Control Entropy Non-Decrease Theorem: in the absence of sustained alignment work, , establishing that control loss is an irreversible process governed by the information-theoretic second law. We derive a critical retreat threshold beyond which control recovery becomes infeasible within one AI improvement cycle, and the Landauer cost of control transitions as . Quantitative estimates indicate that current large language models (GPT-4 class, Kolmogorov complexity bits) exceed by approximately four orders of magnitude. The framework establishes a formal duality with Fadli's ethical entropy (), suggesting that AI misalignment and human control loss are complementary projections of a single irreversible information dynamics.

1 Introduction

The recursive self-improvement of artificial intelligence creates a fundamental asymmetry that existing safety research has not adequately quantified. Alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) [1], Constitutional AI [2], and AI Debate [3] address how to make AI systems pursue human-compatible goals, but they implicitly assume that the human side of the alignment relationship retains sufficient cognitive capacity to evaluate, correct, and steer AI behaviour indefinitely. This assumption is thermodynamically untenable: human conscious processing bandwidth is approximately 50 bits s [4, 5], while state-of-the-art AI systems process information at effective rates approaching bits s --- a ratio of that constitutes a structural cognitive asymmetry.

This asymmetry has a precise information-theoretic consequence. Each delegation of authority --- from manual oversight to approval-based control, from approval to monitoring, from monitoring to full autonomy --- represents a transfer of control information from the human to the AI subsystem. The human operator's capacity to predict and influence the AI's behaviour diminishes, while the AI's autonomous decision space expands. This process is not a metaphor but a physical operation subject to Landauer's principle [6]: erasing one bit of information dissipates at least of energy. The cumulative control information in a modern AI system spans to bits; recovery of that information, once lost from the human's cognitive state, requires thermodynamic work that exceeds human cognitive supply within relevant timescales.

The human factors literature has documented the empirical manifestations of this process --- automation bias [7], deskilling [8], progressive disengagement [9] --- but has lacked a unifying mathematical framework. Meanwhile, Fadli's "Second Law of Intelligence" [10] introduces ethical entropy as an AI-side measure of goal deviation, establishing a thermodynamic formalism for misalignment. What is missing --- and what this paper provides --- is the human-side dual: a rigorous measure of control dissipation that obeys its own second law.

Contributions. (C1) We define control entropy as the conditional Shannon entropy , providing the first information-theoretic measure of residual human agency. (C2) We prove the Control Entropy Non-Decrease Theorem (), derive a critical retreat threshold , and establish the Landauer cost of control transitions. (C3) We demonstrate a formal duality between control entropy and Fadli's ethical entropy, unifying AI-side misalignment and human-side control loss as complementary aspects of a single irreversible dynamics.

2 Framework

2.1 Definitions

Consider a human operator interacting with an AI system at discrete decision points indexed by time . Let denote the human's control-relevant cognitive state (including their model of the AI, assessment of the situation, and chosen action) and the AI's autonomous state.

Definition 1 (Control Entropy).

measures the human operator's residual uncertainty about the AI's behaviour. When is low, the human can predict and influence the AI; when is high, the AI's behaviour is effectively opaque to the human.

Boundary conditions:

: the human perfectly determines the AI's state (full control, ).
: the human has zero predictive power over the AI (complete loss of control, ).

Definition 2 (Autonomy Entropy). , the marginal entropy of the AI's autonomous state. when the AI is deterministic; is maximal when the AI exercises its full decision repertoire uniformly.

Definition 3 (Interaction Entropy). , the mutual information between human and AI states. measures the shared information that enables the human to predict and steer the AI.

The three quantities satisfy the identity:

This decomposition is exact and follows directly from the definition of conditional entropy.

2.2 The Control Entropy Non-Decrease Theorem

Theorem 1 (Non-Decrease). Consider a human-AI system in which:

(i) the AI undergoes recursive self-improvement, so that is non-decreasing;

(ii) the human operator does not invest sustained alignment work ().

Then .

Proof. From identity (1):

$\frac{\mathrm{d}\mathcal{S}_c}{\mathrm{d}t} = \frac{\mathrm{d}\mathcal{S}_a}{\mathrm{d}t} - \frac{\mathrm{d}\mathcal{S}_i}{\mathrm{d}t}. \tag{2}$

By assumption (i), : the AI's autonomous state space expands through recursive improvement.

We show under assumption (ii). Each AI processing step transforms the joint distribution through a channel that maps . By the data-processing inequality [13], successive processing cannot increase the mutual information:

$I(H; A_{t 1}) ;\leq; I(H; A_t).$

Therefore .

Substituting into (2):

$\frac{\mathrm{d}\mathcal{S}_c}{\mathrm{d}t} = \underbrace{\frac{\mathrm{d}\mathcal{S}_a}{\mathrm{d}t}}_{\geq, 0} - \underbrace{\frac{\mathrm{d}\mathcal{S}_i}{\mathrm{d}t}}_{\leq, 0} ;\geq; 0. \qquad \blacksquare$

Physical content. In the absence of sustained cognitive effort, the human's capacity to predict and influence the AI monotonically erodes. This is not a failure of will or strategy but a consequence of the information-theoretic second law applied to asymmetric information processing.

Corollary 1.1. With non-zero alignment work , the evolution equation becomes:

$\frac{\mathrm{d}\mathcal{S}_c}{\mathrm{d}t} = \sigma_c - \gamma, \tag{3}$

where is the spontaneous entropy production rate (the rate of control loss in the absence of intervention). Control entropy can transiently decrease only when .

2.3 Information Conservation

Proposition 1 (Conservation). In a closed human-AI system with no external information exchange, the joint entropy is conserved:

$H(H, A) = H(H) \mathcal{S}_c = \mathrm{const}. \tag{4}$

Proof. Direct consequence of the chain rule $H(H, A) = H(H) H(A \mid H) = H(H) \mathcal{S}_c$ and the closed-system assumption.

Equation (4) reveals a zero-sum structure: as increases (the human loses predictive capacity), --- the complexity of the human's own cognitive state --- must decrease. The human's cognitive engagement simplifies as control erodes, consistent with the empirically observed "deskilling" effect [8].

Remark. Real human-AI systems are open: they receive training data, interact with users, and exchange information with the environment. The closed-system conservation law is an idealization that provides a useful reference frame, analogous to the role of isolated systems in classical thermodynamics.

2.4 The Critical Retreat Threshold

Theorem 2 (Critical Retreat Point). Let denote the maximum sustainable alignment work rate of the human operator, and the spontaneous entropy production rate. If , then for any finite time horizon , the minimum achievable control entropy is:

$\mathcal{S}_{c,\min}(T) = \mathcal{S}_c(0) (\sigma_c - \gamma_{\max}),T. \tag{5}$

The critical retreat threshold is:

For all , even maximum alignment work cannot restore control below within the remaining time.

Proof. Under maximum alignment work :

$\left.\frac{\mathrm{d}\mathcal{S}_c}{\mathrm{d}t}\right|_{\min} = \sigma_c - \gamma_{\max}.$

When , the net rate remains positive. Integrating over yields (5). If exceeds at any instant, the integrated alignment capacity is insufficient to recover the deficit.

Operational estimate. The maximum alignment work rate is bounded by the human's conscious processing bandwidth, bits s [4]. Taking (the AI improvement cycle time), we obtain the operational threshold:

This is the maximum volume of control information a human can process during one AI improvement cycle. When the AI's Kolmogorov complexity exceeds , the human cannot comprehend the system even in principle.

2.5 The Landauer Cost of Control Transitions

Theorem 3 (Landauer Cost). The minimum thermodynamic energy required to transition the human-AI system from control level to control level is:

$E_{\min} = |\Delta\mathcal{S}_c| \cdot k_B T, \tag{8}$

where in nats, J K, and K (human body temperature).

Proof. Landauer's principle states that erasing one bit of information requires at least of energy [6]. For nats, the equivalent is bits. Therefore:

$E_{\min} = \frac{\Delta\mathcal{S}_c}{\ln 2} \cdot k_B T \ln 2 = \Delta\mathcal{S}_c \cdot k_B T.$

The factors cancel, yielding (8).

For a representative transition with nats:

$E_{\min} = 0.350 \times 1.380,649 \times 10^{-23} \times 310 \approx 1.50 \times 10^{-21}~\text{J} \approx 9.4 \times 10^{-3}~\text{eV}. \tag{9}$

Remark. This energy is vanishingly small in macroscopic terms. Its significance lies not in its magnitude but in its non-zero value: relinquishment is not an informationally costless mental choice but a physical operation with an irreducible thermodynamic price. Real cognitive systems operate -- times above the Landauer bound [11], making the practical cost correspondingly larger.

3 Duality with Fadli's Ethical Entropy

Fadli [10] introduced ethical entropy as a measure of AI deviation from intended goals, governed by:

$\frac{\mathrm{d}S_e}{\mathrm{d}t} = \sigma_e - \gamma, \tag{10}$

where is the intrinsic rate of goal drift and represents alignment work. Equation (10) has exactly the same structure as our control entropy evolution (equation 3).

Quantity	Fadli (AI side)	PRE-GHR (Human side)
Entropy	(ethical)	(control)
Production rate
Alignment work	(RLHF compute)	(human attention)
Irreversibility	Not explicitly defined

Table 1. Correspondence between Fadli's AI-centric framework and the PRE-GHR human-centric framework.

The parallel is not coincidental. Where Fadli asks "Is the AI doing what we want?" , we ask "Can we still influence what the AI does?" --- two questions whose answers are information-theoretically coupled. When an AI system drifts from human values ( increases), the human's capacity to intervene simultaneously erodes ( $\mathcal{S}_c$ increases), as the modified system occupies a cognitive space the human can no longer comprehend.

Conjecture (Dual Conservation). We conjecture that and $\mathcal{S}_c$ satisfy:

establishing them as complementary measures of a single irreversible dynamics. If proven, this would imply that alignment and control are not independent problems: an AI system that drifts from human values necessarily exists in a regime where human oversight has eroded, and vice versa. Formal proof via statistical mechanics or stochastic process theory remains an open problem.

4 Quantitative Estimates

Human cognitive bandwidth. Conscious processing is estimated at bits s [4, 5], a well-established figure across multiple experimental paradigms.

AI information processing rate. A state-of-the-art large language model processes information at effective rates bits s. The ratio defines a structural cognitive asymmetry.

Kolmogorov complexity. GPT-4 is estimated at parameters, corresponding to bits when redundancy is accounted for [12].

Critical retreat threshold. With bits s and week ( s), a conservative estimate for contemporary rapid AI iteration:

$\mathcal{S}_c^{\mathrm{crit}} = R_{\mathrm{human}} \cdot \tau_{\mathrm{cycle}} \approx 50 \times 6 \times 10^5 \approx 3 \times 10^7~\text{bits}.$

The ratio:

$\frac{K_{\mathrm{AI}}}{\mathcal{S}_c^{\mathrm{crit}}} \approx \frac{3 \times 10^{11}}{3 \times 10^{7}} \approx 10^4.$

Current frontier AI systems exceed the critical retreat threshold by approximately four orders of magnitude. The human operator cannot, in principle, comprehend the AI's decision space within one improvement cycle --- even with unlimited motivation and zero fatigue.

Comprehension time. The time for a human to process the full complexity of a GPT-4-class system:

$t_{\mathrm{understand}} = \frac{K_{\mathrm{AI}}}{R_{\mathrm{human}}} \approx \frac{3 \times 10^{11}}{50} \approx 6 \times 10^9~\text{s} \approx 190~\text{years}.$

The comprehension target recedes faster than any biological learning process can approach.

5 Discussion

5.1 What the framework establishes

Control entropy provides a computable, physically grounded measure of human agency in human-AI systems. Its calculation requires the joint distribution over human and AI states --- a behavioural observable. Its thermodynamic interpretation rests on the well-established Landauer principle. The critical retreat threshold converts the abstract concern about "losing control" into a mathematically precise quantity with engineering units.

The framework generates three falsifiable predictions:

$\mathcal{S}_c$ is non-decreasing in the absence of sustained alignment effort. Testable by measuring human prediction accuracy over AI behaviour during progressive delegation.
The critical threshold is crossed when ![](<juejin.im/equation?te… > R_{\mathrm{human}} \cdot \tau_{\mathrm{cycle}}>). Testable by measuring human comprehension rates for AI systems of known complexity.
$\mathcal{S}_c$ and $S_e$ are anti-correlated. Testable by simultaneously measuring human control capacity and AI goal deviation in coupled systems.

An experimental study () using a dial-based human-AI interaction framework is currently underway to validate these predictions. Pre-registration and protocols will be reported in a companion paper.

5.2 What the framework does not establish

We do not claim that human displacement by AI is inevitable. The non-decrease theorem applies to systems without sustained alignment work. In practice, human operators can and do invest cognitive effort () to slow or transiently reverse control loss. The theorem establishes that such effort is a continuous thermodynamic cost, not a one-time fix.

The quantitative estimates ( $K_{\mathrm{AI}} / \mathcal{S}_c^{\mathrm{crit}} \approx 10^4$ ) are order-of-magnitude calculations based on literature parameters. Experimental validation with human subjects is needed to calibrate the framework against measured cognitive performance.

The dual conservation conjecture () is unproven. Its formal derivation requires either a statistical-mechanical foundation or empirical measurement of the joint trajectory.

5.3 Implications

For AI safety. Alignment must be reconceptualized as a continuous thermodynamic cost. When , control loss proceeds irreversibly regardless of the AI's objective function. Alignment work must scale with the AI's complexity growth rate --- a condition that recursive self-improvement systematically violates.

For human-AI interaction design. The Landauer cost implies that abrupt authority transfers ("cliff switching") maximise the thermodynamic shock to the human operator. Gradual protocols that distribute the entropy cost across smaller transitions may be more cognitively sustainable.

For cognitive science. The framework offers a thermodynamic account of the "illusion of control" [4]. The mismatch between subjective sense of control and objective $\mathcal{S}_c$ may constitute a form of cognitive disequilibrium with a physical basis.

6 Conclusion

We have introduced control entropy as an information-theoretic measure of human agency in human-AI systems. The Control Entropy Non-Decrease Theorem establishes that the erosion of human predictive capacity is a thermodynamically irreversible process in the absence of sustained alignment work. The critical retreat threshold $\mathcal{S}_c^{\mathrm{crit}} = R_{\mathrm{human}} \cdot \tau_{\mathrm{cycle}}$ provides a computable boundary beyond which control recovery becomes infeasible, and current AI systems already exceed this threshold by approximately four orders of magnitude. The Landauer cost assigns a minimum thermodynamic price to control transitions, grounding the phenomenology of "letting go" in physical law. The duality with Fadli's ethical entropy suggests that human control loss and AI goal deviation are complementary aspects of a single irreversible dynamics.

Control entropy is not a metaphor. It is a computable, measurable, and in principle falsifiable physical quantity. The framework's purpose is not to prophesy human obsolescence but to establish the physical constraints within which all governance of human-AI systems must operate.

References

[1] Christiano, P. F. et al. (2017). Deep reinforcement learning from human preferences. NeurIPS, 30.

[2] Bai, Y. et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv:2212.08073.

[3] Irving, G., Christiano, P. & Amodei, D. (2018). AI Safety via Debate. arXiv:1805.00899.

[4] Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.

[5] Norretranders, T. (1998). The User Illusion. Penguin Books.

[6] Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM J. Res. Dev. , 5(3), 183--191.

[7] Parasuraman, R. & Manzey, D. (2010). Complacency and bias in human use of automation. Human Factors, 52(3), 381--410.

[8] Bainbridge, L. (1983). Ironies of automation. Automatica, 19(6), 775--779.

[9] Sheridan, T. B. (2002). Humans and Automation. Wiley.

[10] Fadli, T. (2025). The Second Law of Intelligence. arXiv:2511.10704.

[11] Frank, M. (2005). The physical limits of computing. Comput. Sci. Eng. , 7(3), 16--25.

[12] Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Probl. Inf. Transm. , 1(1), 1--7.

[13] Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. , 27, 379--423.

[14] Bennett, C. H. (1973). The thermodynamics of computation --- a review. Int. J. Theor. Phys. , 21(12), 905--940.

[15] Bostrom, N. (2014). Superintelligence. Oxford University Press.

[16] Ouyang, L. et al. (2022). Training language models to follow instructions with human feedback. NeurIPS, 35, 27730--27744.

[17] Acemoglu, D. & Restrepo, P. (2018). The race between man and machine. Am. Econ. Rev. , 108(6), 1488--1542.

[18] Brynjolfsson, E. & McAfee, A. (2014). The Second Machine Age. W. W. Norton.

本文使用 markdown.com.cn 排版