Playing Grooped on a phone
Solo Project

A daily word puzzle,
built and curated with AI

Play today's puzzle ➜
The design problem

The hardest part isn't the tech

Word puzzles sound simple. Four groups of four words. But the design space is deceptively deep.

The real challenge is intentional misdirection. Take IRON, PRESS, CURL, BENCH, all things you do at the gym. But PRESS also fits a medieval weapons category. IRON fits things you can flip. CURL fits hair styling. That web of almost-right answers is what makes a puzzle feel clever instead of arbitrary. Designing it is the closest thing I've found to level design.

Every word on the board has to earn its place. Not just as a correct answer, but as a convincing wrong one.

How it's made

An AI pipeline with a human editor at the center

Every puzzle starts with a prompt. I built an internal tool called Puzzle Editor 3000 that generates a full puzzle: four categories, four words each, plus intentional decoys designed to blur the lines between groups.

The generator doesn't just make puzzles. It tracks 28 connection mechanics across four tiers, each with a cooldown period so the same trick never runs two days in a row. A fill-in-blank category can appear weekly. A first-letter acrostic gets saved for once every six weeks. The system reads the last 60 published puzzles before generating a new one, checks which mechanics are underused, and steers toward variety automatically.

When I export a puzzle, it's appended to the live file on GitHub. The next generation reads that file first. The system has memory.

AI generates, and I review every puzzle before it ships and regenerate the parts that don't work for me. The system learns from what I keep.

Connection mechanic tiers
Tier 1
Workhorses
cooldown: 4 puzzles
  • Taxonomy
  • Found in scene
  • Prefix blank
  • Suffix blank
  • Synonyms
Tier 2
Regulars
cooldown: 7 puzzles
  • Things that verb
  • Can be verbed
  • Shared hidden property
  • Metaphor substitutes
  • Ways to verb
  • Idiom completion
  • Ordered set member
  • Works by one maker
  • Characters in one work
  • Facets of a named subject
Tier 3
Specials
cooldown: 21 puzzles
  • Hidden word
  • Homophones
  • Compound
  • Add/Drop letter
  • Eponyms
  • Cross language
  • Abbreviation expansion
Tier 4
Treats
cooldown: 45 puzzles
  • Anagram of one source
  • Acrostic first letters
  • Chain through hub
  • Portmanteau
  • Onomatopoeia

The quality of the output lives or dies on the prompt. Writing it was a design problem: precise enough to produce consistent results, flexible enough to surprise me. This is the actual system prompt the generator uses, loaded live from the source code.

Generation pipeline
Roie
AI
Generate triggered
Fetch live puzzle history from GitHub
Scan last 21 puzzles, compute mechanic cooldowns
Identify underused Tier 2 and Tier 3 mechanics
Pick spine mechanic, prefer underused, respect cooldowns
Generate puzzle with cross-pull decoys
Verify hidden words letter-by-letter
Check all 16 words, duplicates and 60-day repeat rule
Inject mechanic + tier into each category
Strip scratchpad fields, save clean draft
Editor review, regenerate weak categories, edit words
Export triggered
Append to puzzle history, push to GitHub
Puzzle is live
puzzle_generator.py

Loading prompt...

The tool

Puzzle Editor 3000

The editor is the interface between AI output and human judgment. Generate a full puzzle, regenerate individual categories, swap words, rename groups, or type your own category and let the AI fill in the words. Every category shows which connection mechanic the model used and which tier it belongs to, so I can see at a glance if the puzzle is mechanically varied or just four versions of the same trick.

The dashboard at the top tracks mechanic usage across the last 21 puzzles. If Tier 2 is dominating, the next generation will steer toward Tier 3 automatically. I can see what's been overused and nudge the system before it gets repetitive.

Puzzle Editor 3000 showing four categories with refresh and ban controls

Puzzle Editor 3000: generate, curate, and track puzzle mechanics in one place.

The game

What the player sees

None of the pipeline is visible to the player. They get 16 words and four attempts. The mechanic, the tier, the cooldowns. All invisible. The only thing that matters is whether the puzzle feels satisfying to solve.

1
16 words, 4 groups
A fresh puzzle. The grid shows 16 words with no hints about what connects them.
2
Spot a pattern
Four words selected, ready to submit. Is it the right group?
3
First group found
A correct guess collapses into a color block. Twelve words remain.
4
Puzzle complete
All four groups revealed. Come back tomorrow for a new one.
16 words, 4 groups Spot a pattern First group found Puzzle complete
1
16 words, 4 groups
A fresh puzzle. The grid shows 16 words with no hints about what connects them.
2
Spot a pattern
Four words selected, ready to submit. Is it the right group?
3
First group found
A correct guess collapses into a color block. Twelve words remain.
4
Puzzle complete
All four groups revealed. Come back tomorrow for a new one.
Iteration with data

Session recordings over surveys

I connected Microsoft Clarity to track how people actually play. People play fast and lose patience faster. A couple of wrong guesses and some players just leave. That insight pushed me to make puzzles more solvable.

"I believe in seeing what people do, not asking what they would do."

Microsoft Clarity session recording: tracking real player behavior.

Curious? Give it a try.

Scan to play on your phone QR code linking to grooped.de
or Play today's puzzle ➜
Back to all projects