Planning Workouts is a Hiring Problem

Most workout apps treat program building as template lookup. Pick “Push/Pull/Legs”, get the same exercises as everyone else. Maybe they shuffle the order or swap bench press for incline bench. But the fundamental model is the same: a human or an algorithm drafted a fixed list, and you get a copy.

The real problem is harder. A user has dumbbells, a pull-up bar, and a bad shoulder that doesn’t like overhead pressing. They’ve been training for three months. They starred Romanian deadlifts because those are their favorite, and they keep skipping lunges. Build them a workout plan that respects all of that - automatically, without a human coach reviewing it.

Slots - The Job Postings

Every position in a workout template is a slot, and every slot has a spec. Not just “make sure to hit chest” but a structured and expansible contract.

Hard requirements - constraints. These are binary. If an exercise can’t satisfy all of them, it’s out. Movement pattern (push, pull, hinge). Muscle group (quads, lats, glutes). Compound versus isolation. If the slot calls for a compound push movement and an exercise is an isolation curl, it doesn’t matter. It’s not considered.

Soft requirements - preferences. These influence ranking, not eligibility. Barbell or dumbbell? Staple exercise or niche? Central nervous system (CNS) demand. An exercise that doesn’t match a preference still gets considered - it just scores lower.

In hiring terms: “Must have 5 years of experience building event-driven systems” is a constraint. “Nice to have: hands-on experience with distributed consensus” is a preference. You wouldn’t reject a candidate because they don’t know how Raft works. But no amount of preference points saves a candidate who doesn’t meet the hard requirements.

Because constraints and preferences are separate types, the engine can’t accidentally treat a preference as a dealbreaker. A slot that prefers barbell exercises still gets filled when the user only has dumbbells.

Two Rounds of Background Checks - Filtering

An exercise doesn’t just need to match the slot. It needs to work for this specific user, right now. That’s two separate questions, answered by two separate filter passes.

Pass one - slot eligibility. Does the exercise match the slot’s structural spec? Right movement tags? Right muscle group? Right isolation type? Within the maximum difficulty? This is resume screening against the job description. It knows nothing about the user and the same slot spec produces the same eligible exercises regardless of who’s doing the workout.

Pass two - selection eligibility. Can this particular user do this exercise in this particular session? Does the user have the required equipment? Is the exercise excluded due to an injury or joint restriction? Has it already been used in another slot this session? Is it within the user’s skill level? Is there enough CNS budget remaining?

The constraint types (equipment ownership, exercise exclusions, joint restrictions, session deduplication, CNS budget, skill level, …) are derived from user profiles and session state.

Back to our user with the bad shoulder: Overhead press passes slot eligibility for a “compound push” slot, but the shoulder joint exclusion catches it in pass two. Dumbbell bench press passes both. It’s the second pass that personalizes, after the search space has been cut down by 90%.

The Budget - Finite Resources

Every exercise has a neurological and a time cost. Modeling these costs so they self-regulate is an interesting design problem.

The CNS consumption model is asymmetric. When a high-intensity exercise is selected, the CNS budget steps down one level. Lighter exercises don’t affect the budget at higher levels. But once the budget reaches moderate, it’s nearly spent - any subsequent selection, regardless of intensity, drops it to the floor. Early in a session you can freely mix heavy and light work without the light work costing anything. Once you’ve burned through the heavy-compound allowance, even a set of curls signals “we’re done with intensity.”

Time works differently - it’s additive. More sets means more time. If the user tends to take longer rests for a given exercise, that costs more time too. As slots fill, the remaining time budget shrinks until there’s not enough left for another exercise.

Both budgets deplete as the plan grows, narrowing the search space further. The system doesn’t say “put hard exercises first” or “don’t plan more than 50 sets unless they take 60s rests.” The budget dynamics just make that the natural outcome.

In practice, a session that opens with deadlifts and squats can’t end with heavy barbell rows - the CNS budget won’t allow it. But curls, lateral raises, face pulls? Those fit fine under a depleted ceiling. A trainee who tends to take 5 minutes rest between lat pulldowns won’t get them patched in as a quick finisher at the end.

Ranking

Once an exercise survives both filter rounds, it gets a scorecard built from two sources.

Slot preferences. Does the exercise’s load type match the slot’s preferred equipment? Is it a staple exercise or a niche one? These are not binary, they’re gradients. The system doesn’t just check “is it low CNS?” - it measures how low and passes it on transparently.

User behavior signals. Has the user starred this exercise? Have they frequently skipped it in past sessions? If they skip, how consistently do they skip? Is it a temporary thing or a general pattern? Did they do this exercise a lot and is this a user that keeps coming back to their favorites? The scores are scaled by the system’s confidence in that behavioral pattern, following the philosophy “Don’t guess if you don’t know”. Swapping squats two times is a signal. Swapping them for months is a clear preference.

All weights live in a config - every bonus, every penalty, every multiplier. The engine has no opinion about how much a starred exercise should be preferred. That’s a config decision. Change the config, change the behavior. Every ranking produces a full scoring breakdown, so you can see exactly why exercise A ranked above exercise B: “+3.0 load type match, +2.0 starred, -1.5 frequent skips at high confidence, total: 3.5.” This is key to evolving the engine. You can’t improve what you don’t understand.

Our user starred Romanian deadlifts, that’s a scoring bonus whenever RDLs are eligible. They keep skipping lunges, that’s a scaled penalty that pushes lunges down the ranking. The program still can include lunges if nothing else fits, but it won’t choose them when alternatives exist.

Orchestration

A program has sessions. Sessions have slots. Slots have priorities. The builder fills them in priority order - primary compounds first, accessories last.

After each slot fills, the selected exercise joins the exclusion set, as it shouldn’t appear in another slot this session. The next slot sees an updated context with the growing exclusion list. Each swap updates the budgets and exclusions for the next decision.

No single mechanism here is clever. Constraints are binary checks, preferences are weighted scores, the budget is a ceiling, exclusions are a rolling set. But their interactions - constraints narrowing the pool, the budget filtering by intensity, exclusions preventing repeats, preferences personalizing the pick - produces a program where our dumbbell-and-pull-up-bar user gets dumbbell bench press in slot one, pull-ups in slot two, and dumbbell RDLs (starred, ranked high) in slot three, with no overhead pressing and no repeats.

Diagnostics

Sometimes no exercise fits a slot. The system returns a typed diagnosis - which gate failed, which constraints caused it. Two failure modes, matching the two filter passes. “No exercises matched slot constraints” - the template asks for something the library doesn’t have. A template problem. “Exercises matched but all failed selection constraints” - exercises exist for this slot, but this user can’t do any of them right now. The distinction matters because the response is different: a template problem means the program design needs updating; a user constraint problem means the app can suggest “add a resistance band” or “skip this slot.”

But it’s important to stay humble and not mistake the map for the terrain. We can’t read the user’s mind, and we can’t rely on them diligently updating everything on time. Maybe their elbow feels better today. So the system also surfaces exercises that almost fit - listing which constraints they failed - and gives the user the chance to choose one anyway, despite the recommendation.

Quality Control - Evaluations

Every slot got the best exercise for its spec. But a program is more than its exercises. Did we use the user’s equipment well? Is push/pull volume balanced?

Equipment utilization measures, for each piece of available equipment, how much of the program should use it (based on eligible exercises weighted by recommendation level) versus how much it actually does. A user with a barbell and dumbbells who gets an all-dumbbell program is a signal that ranking weights need tuning - the barbell is going to waste. Vice versa, dumbbells have their place, even when barbells are available.

Volume distribution categorizes muscles into balance pairs (push versus pull, quads versus posterior chain, …) and measures the ratio. If they skew too far toward one side, that gets penalized, which feeds back into our configuration pipeline.

These evaluations aren’t part of the selection pipeline - they score the output. We run them against every generated program during development to catch regressions in ranking config changes. If a config tweak improves starred-exercise placement but tanks push/pull balance, the evaluation scores surface it before it ships.

Substitution - Same Pipeline, Different Entry Point

The first version of exercise swapping was a 1-to-many lookup table (barbell bench press to dumbbell bench press, barbell squat to leg press). Static pairs, manually curated. It worked until it didn’t. A user without a barbell can’t do incline bench press either. A user who already has leg press in another slot gets a duplicate. A beginner gets suggested a movement that’s too advanced. The “replacement” didn’t account for any of the context that made the original selection good in the first place.

The fix was realizing that swapping isn’t a replacement problem. It’s actually the same selection problem we faced when building the program. The question isn’t “what’s similar to bench press?” It’s “what fills the role that bench press was filling, given everything we know about this user right now, in this very session?” That’s just… a more knowledgeable slot. Same as hiring - when someone leaves the team, you don’t look for the most similar person. You look for someone who fills the gap they created, and that’s more useful than anything a job listing can express.

So the system converts the original exercise into a slot spec by deriving the constraints and preferences, and taking the user profile and session state into account. Then it runs the same two-gate, rank, select pipeline. The swap candidate has to pass the same equipment checks, the same injury filters, the same “not already used” constraint. It gets ranked by the same user behavior signals. It’s not “find something similar.” It’s “fill this position again, from scratch, with more context than originally - the actual state of the workout.”

Program building fills slots from template specs. Substitution fills a slot derived from an exercise’s own properties.

What Monday Looks Like

Back to our user. Dumbbells, pull-up bar, bad shoulder, three months in, loves RDLs, hates lunges. The system builds their Monday full-body session:

Slot 1 - compound push: Overhead press is filtered out by the shoulder joint exclusion. Barbell bench fails the equipment check. Dumbbell bench press passes both gates, scores high on load type match. Selected.
Slot 2 - compound pull: Pull-ups pass both gates, score high as a staple compound. Selected. Dumbbell bench is already in the exclusion set.
Slot 3 - compound hinge: Dumbbell Romanian deadlifts pass both gates and get a starred bonus that pushes them above dumbbell stiff-leg deadlifts. Selected.
Slot 4 - accessory push: Budget is depleted from the three compounds. Lateral raises fit under the ceiling, score well on the low-CNS gradient. Selected.
Slot 5 - accessory pull: Face pulls pass, lunges don’t even match this slot’s movement tags. But even in a slot where lunges would match, the skip penalty would push them down the ranking.

No overhead pressing. No equipment they don’t have. No repeats. RDLs included because they’re starred. Lunges avoided because they’re skipped. Compounds front-loaded because the budget enforces it. Every decision traceable to a specific constraint, preference, or scoring factor. And when the user wants to swap an exercise, they can make an informed decision, rather than guessing.

That’s the system. Slots define intent. Constraints enforce reality. Budgets manage resources. Preferences personalize. Evaluations verify balance. The pattern isn’t fitness-specific - any domain where you fill positions from a catalog under constraints benefits from the same separation. But this is the domain where we needed it, and it works.