Stakeholders, Tokens, Time
Research note. Drafted 2026-05-13 as §4 of the working LLM-oriented PL design installment; published as a standalone note 2026-05-17. Working draft.
TL;DR
"One language is more efficient than another" collapses if we don't specify for whom. This note unfolds the question along two complementary views. The structural view is an inventory of nine stakeholders, each with a primary efficiency metric: human author, human reviewer, human maintainer, compiler, static analyser, LLM as consumer, LLM as generator, systems treating code as data, and security auditor. The economic projection reduces those nine onto two scarce resources: Tokens, the cost of the LLM context budget, priced per million; and Time, the universal cost for every non-LLM stakeholder, priced in dollars per hour. Quality, meaning correctness, security, and robustness, is a constraint, not a resource to minimise. The Pareto frontier between Tokens and Time is what existing artifacts implicitly occupy and what new artifacts have to find unoccupied space on. The first move of methodology is to name the stakeholder one is optimising for; opaque optimisation against an unnamed stakeholder is a recurring failure mode in the LLM-oriented PL literature.
The structural view: nine stakeholders
A programming language is consumed by many agents, each with its own notion of efficiency. The framing one language is more efficient than another collapses if we don't specify for whom. The nine rows below are the ones design conversations need to be able to name explicitly:
- The human author, whose efficiency is time to express an idea: keystrokes, cognitive load, distance from intention to working code. Secondary metrics: IDE-completion fit, muscle memory, cost to refactor.
- The human reviewer or auditor, whose efficiency is time to verify correctness locally, the ability to read a span of code and judge it correct without consulting global context. Secondary metrics: visual cues, semantic density per line, detectability of code smells.
- The human maintainer, whose efficiency is time to debug and change without breaking other things. Secondary metrics: quality of error messages, clarity of stack traces, blast radius of a change.
- The compiler or interpreter, whose efficiency is parse complexity and cost of reasoning over the AST. Secondary metrics: feasibility of static analysis, optimizability.
- The static analyser, IDE, or language-server protocol, whose efficiency is feasibility of inference: can it infer types, resolve references, and offer correct completions without unbounded computation? Secondary metrics: latency of incremental updates.
- The LLM as consumer, whose efficiency is tokens per semantic unit and ambiguity load. Secondary metrics: alignment with the BPE vocabulary, ease of in-context learning.
- The LLM as generator, whose efficiency is one-shot accuracy and correction cost. Secondary metrics: rate of syntax errors, rate of grounding, retry budget.
- Systems that treat code as data, including indexers, retrievers based on embeddings, and automated tools to modify code, whose efficiency is indexability and retrieval signal. Secondary metrics: quality of embedding vectors, matchability of AST patterns.
- The security auditor, human or automated, whose efficiency is reasoning about effects and the scope of capability. Secondary metrics: clarity of the sandbox model, locality of declared side effects.
Most existing programming languages optimise for one to three of these stakeholders and accept losses on the rest. Python optimises for human author plus human maintainer plus ecosystem; loses on LLM tokens and on density at a glance. SimPy preserves Python's AST and the rest of Python's strengths while reclaiming the loss on LLM tokens; the cost is borne by the human reviewer, who now reads code without the formatting cues PEP 8 inserted. Pel optimises for LLM generator plus LLM consumer; gives up the advantages from the human ecosystem that a 30-year-old language accumulates. Inflexión optimises for morphological density as a research artifact; gives up universal accessibility entirely, since the cost to a human reviewer who is not a Spanish speaker is unbounded.
The first move of methodology is to name the stakeholder one is optimising for. Opaque optimisation against an unnamed stakeholder is a recurring failure mode in the LLM-oriented PL literature.
The economic projection: two scarce resources
The nine rows of the structural view compress, for economic purposes, onto two scarce resources.
Tokens are the scarcity of the LLM context budget: input tokens plus output tokens, priced concretely per million tokens by every commercial LLM provider, counted trivially via the relevant tokenizer library. Every LLM stakeholder's efficiency reduces to Token consumption per task.
Time is the universal scarcity for every non-LLM stakeholder: time for the author to write, for the reviewer to read, for the auditor to verify, for the maintainer to debug, for the learner to onboard, for the machine to parse, for the machine to compile, for the machine to execute. Time is concretely measurable in seconds and priceable in dollars per hour at the relevant rate of labour or compute.
The trade-off between Tokens and Time is the central economic optimisation: a language with rich notation may save Tokens at the cost of human Time through learning curve and audit difficulty; a language with verbose notation friendly to humans may save Time at the cost of Tokens through longer prompts and larger generation budgets. The Pareto frontier between Tokens and Time is what existing artifacts implicitly occupy, and what new artifacts have to find unoccupied space on.
The reduction is lossy on purpose. Collapsing nine stakeholders onto two resources is what makes the framing concretely measurable and concretely priceable; the loss is that the structural view per stakeholder has to be kept available alongside the economic projection, for the design-conversation work that depends on naming the stakeholder one is optimising for. The two views are complementary, not interchangeable.
Quality is a constraint, not a resource
A third axis sits beside Tokens and Time but is conceptually different: Quality. Correctness, security, robustness, and predictability are not resources to minimise; they are constraints the optimisation respects.
The economic question is therefore: minimise joint Tokens plus Time cost subject to Quality at or above a chosen threshold. A language that wins on Tokens by stripping type information may lose Quality dramatically; the trade-off is real and has to be priced. Let Me Speak Freely by Tam 2024 is the empirical reminder that the Quality constraint is binding: aggressive format restrictions can degrade LLM reasoning quality below the threshold worth accepting.
A Pareto-frontier diagram of candidate languages plotted on Tokens × Time, with the Quality threshold marked as a constraint band, is the right visual artefact for the Stage 5 synthesis of the methodology paper. A candidate inside the constraint band can be honestly compared on the two resources; a candidate outside the band is disqualified on Quality regardless of how well it does on the resources.
Why this framing matters
The pre-methodology field of LLM-oriented PL design has been making efficiency claims without naming the stakeholder and without decomposing the resource. SimPy's 13.5% win on LLM tokens is real; the framing that's missing is which other stakeholder's resource is that win pulled from? The answer is the Time of the human reviewer, partially. Pel's win from uniform grammar at generation is real; the missing framing is what stakeholder's Time is it costing? The answer is any human author or maintainer who would otherwise have benefited from a 30-year ecosystem. Quasar's 42% win on time at ViperGPT is real; the missing framing is what's the Quality constraint band? The answer is the uncertainty bound from conformal prediction, which gives the Quality framing back explicitly: Quasar is the artifact in the field that handles this most honestly.
The framing this note proposes is therefore:
- Name the stakeholder.
- Reduce the stakeholder's efficiency to a cost in Tokens and a cost in Time.
- State the Quality threshold the candidate has to clear.
- Plot the candidate on the resulting Pareto frontier against the existing field.
A candidate language that publishes a position on the frontier in this form is reviewable; a candidate that publishes a win specific to a context without the framing is local to its workload by default. The empirical-cascade note develops the apparatus that produces frontier positions in this form.