The Two Questions
Research note. Drafted 2026-05-13 as §1 of the working installment on LLM-oriented PL design; published as a standalone note 2026-05-17. Working draft. The framing both questions sit on continues to develop as the empirical cascade reports back.
TL;DR
Two questions structure this research thread. The first is the design question: what does the most efficient programming language look like, given that it needs to serve humans, machines, and LLMs? The answer is empirical, not theoretical, and reduces to a question about which corner of a Pareto frontier across multiple stakeholders is worth occupying, and how one would know. The second is the gating question: does building a new programming language actually add value over the existing field of 53 artifacts, or would the exercise just be adding noise? The null result is acceptable and may be the right answer. The methodology developed across these research notes is the apparatus by which both questions become reviewable. The answers belong to the empirical cascade and the cascade has just begun.
The framing
The first installment of this series is the Babel methodology paper. It argued that the field of esoteric programming languages had produced a vast corpus and a vague methodology: roughly 1,500 catalogued languages and ~800 Brainfuck derivatives, with each author rebuilding the same scaffolding because no shared parameter schema existed. Babel proposed that schema. The second installment is Inflexión. It develops one specific instantiation, a hand-built esoteric programming language whose semantics flow from Rioplatense Argentine Spanish grammar, and joins a small lineage of inflection-driven non-English natural-language esolangs: Perligata, Espro, Tampio, Wenyan, and Inflexión. Both installments are written for humans. The languages they discuss are evaluated by what they do for human readers, writers, and the small community of practitioners who care about esoteric programming languages as a research and cultural genre.
This research thread is about a parallel field that has emerged on a different schedule and for a different audience. Between Sun, Du, Yang, Li, and Lo's ISSTA 2024 paper AI Coders Are Among Us and the present moment, at least 51 distinct artifacts have been built whose target audience is large language models. The mid-2026 field survey catalogues them. SimPy strips Python's formatting tokens to cut the cost of LLM input by 13.5% on CodeLlama and 10.4% on GPT-4. Pel designs a new programming language from scratch with a uniform grammar "to facilitate easier learning (in-context) and reliable generation by LLMs". Quasar reports 42% time and 52% security improvements on ViperGPT with a language that has a lambda-calculus core, optimised for the code actions of LLM agents. LLMON proposes a markup native to LLMs at the structured-data interface. The LMPL workshop at SPLASH, Language Models and Programming Languages, held its first edition in Singapore in 2025 and is accepting submissions for its second edition in Oakland for October 2026. The field has a name and a venue.
The pace and the scale together raise two questions, both of which this thread takes seriously.
The first question: the design question
What does the most efficient programming language look like, considering it needs to be understood, written, vetted, or at least auditable by humans; easy to understand and implement by machines; and easy for LLMs to read and generate?
The question's structure matters. Most efficient is not an optimisation on a single axis. SimPy is most efficient on the token economy of LLM consumption of Python code. Inflexión is most efficient on the morphological density of semantics derived from grammar. APL is most efficient on expression density per glyph. Python is most efficient on human author velocity within a tooling ecosystem. None dominates the others on all axes.
The design question reduces, honestly stated, to a question about which corner of a Pareto frontier across multiple stakeholders is worth occupying, and how one would know. The field survey enumerates the artifacts that occupy known corners. The note on design axes describes the ten dimensions along which a new artifact can pick a position. A future note on Tokens and Time will develop the reduction of those design axes onto a measurable frame of two resources, which is what makes the framing of the Pareto frontier concretely usable.
The second question: the gating question
Does building a new programming language actually add value over the existing positions on the frontier, like Python plus SimPy, Pel, Quasar, LLMON, and the rest, or would the exercise reduce to pretending to be cool with accepted techs?
This is the discipline of honest preparation applied to the project itself. Installment 05's lineage correction reframed Inflexión from first ever to fifth in a small lineage because a literature check before publication surfaced four prior authors. The same discipline applies here, one level up. The decision to exist as a new language is something the empirical work must justify, not something that exists by default.
The null result is real and acceptable. If the empirical comparison shows existing artifacts already cover the Pareto frontier on the chosen profile of stakeholders, the right outcome is to ship the methodology as the contribution and stop. No new language. The temptation to add another entry to a field of 51 artifacts because the surrounding research is interesting is exactly the trap to avoid. We are not aware of a comparable methodology paper in the 2024–2026 field of LLM-oriented PL design that has applied this discipline explicitly. Surfacing the discipline is, in itself, part of the methodological contribution.
What follows from the two questions
The methodology developed across these research notes is the framework that makes both questions reviewable.
For the design question, the framework reduces the design space onto two measurable resources, Tokens and Time, under a Quality threshold constraint. The design space spans nine axes plus a tenth surfaced by the audit. A future note will develop that reduction. A candidate language can then publish a position on the resulting Pareto frontier rather than an unscoped efficiency claim.
For the gating question, an empirical cascade sequenced cheapest first by measurement cost lets a researcher answer the question in cost order. Tokenizer measurements that are free to run settle some candidates before any model is called. Cheap instrumentation of the toolchain settles others before any apparatus for human studies is built. Expensive human studies are reserved for candidates that survive the earlier stages. A future note will sketch the cascade in detail.
Both pieces of apparatus are designed so that a null result is reachable cheaply, not just after a year of building. The discipline is not just to ask the gating question but to have machinery that can return no on the cheapest evidence sufficient to do so.
Why both questions matter to this thread
The two questions are not equally answerable on the present evidence. The design question is a question on the Pareto frontier across multiple stakeholders, with an apparatus now in place to answer it. The gating question has no answer yet, by design. The methodology is the apparatus by which both questions become reviewable. The answers belong to the empirical cascade, and the cascade has just begun.
If the cascade returns the null result, this research thread terminates as a methodology contribution. If it surfaces a real gap, a future installment becomes the implementation paper for whatever artifact occupies that gap. Either outcome is fine. The framing has done its work either way.