NewsJune 12, 2026

The Invisible Lottery: Algorithm Steering in LLM Code Generation at ICML 2026

The Invisible Lottery: Algorithm Steering in LLM Code Generation at ICML 2026

Akanksha Narula, Mofasshara Rafique, and I have a new paper at ICML 2026 (43rd International Conference on Machine Learning) in Seoul, South Korea.

The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation

Have you noticed how an LLM can quietly burn you with its algorithm choice? You ask for a function, the tests pass, you ship. Months later something melts under load, and you discover the model gave you naive recursion when it could just as easily have given you matrix exponentiation. Both pass small-input tests. Only one of them scales. The model made that choice for you, and nothing in your workflow ever surfaced it.

What tips the model toward one algorithm or another? Cues: incidental words and metadata around the actual task, such as a persona in the system prompt, surrounding code, or a project name. This project started when we realized nobody was studying what cues do to algorithm choice. Prompt sensitivity is a well-studied area, but almost always as a control surface for output quality: craft the right prompt and the model succeeds more often. Whether cues steer the model among several correct solutions, with all the consequences for performance, security, and maintainability, was sitting there unexamined. I will admit this is not our usual kind of work. Controlled studies of model behavior are typically the territory of other groups; we build systems. We could not resist.

So we measured it. We call cue-induced shifts in algorithm choice algorithm steering, and we ran 46,535 controlled experiments spanning 11 tasks, 19 cue types, and 15 model configurations. The steering is large and systematic. For tree traversal, a space-efficiency cue drops recursive implementations to 0%; a readability cue pushes them to 89%. A prototype cue activates eval-based shortcuts in 70% of expression-parsing outputs, versus 6% under an interview cue. A junior-persona cue yields space-optimized memoization 100% of the time; an academic-persona cue, 14%. Even cues we designed as neutral placebos, such as team names, project codes, and color themes, shifted algorithm choice by 26 percentage points on average. The steering appears in applied tasks such as rate limiting, not just classical exercises.

Correctness-based evaluation misses all of this. Benchmarks like HumanEval only check whether the tests pass, so a fast algorithm and a slow one get the same score. An accidental cue, a stray phrase in a system prompt or a leftover project codename, quietly decides which algorithm ships. That is the invisible lottery. This is a big deal for vibe coding: when nobody reviews the generated code, you will not just ship the occasional bug; you will ship biased algorithm choices and performance bugs that no test suite catches. The most reliable mitigation we tested is also the simplest: name the algorithm you want explicitly.

Congratulations to Akanksha and Mofasshara, who ran all 46,535 experiments and analyzed all the data. Behind each percentage in this paper sits a mountain of careful, patient work.

The preprint is available below. We will present the paper at the conference in Seoul in July.

Comments