Phase A.1 — Atelier — creative AI layer launch (parallel track)

Date opened: 2026-05-02 Status: Active — implementation scaffold landed; first production run pending operator wiring of ATELIER_ANTHROPIC_API_KEY on the Hetzner host.

Why this phase exists

The brute-force runner (Phase 2B.x) is the project's discipline — it exhaustively sweeps parameterized hypotheses and produces clean negative results. It is not the project's imagination. Every new attack family in the queue was either picked from prior_work, derived from Sanborn hints, or proposed by the human operator working with Nova.

Atelier is the agentic version of the function the human operator was filling. It reads the project's full vendored corpus (Stein paper, Sanborn statements, prior_work, statistical baseline, false-positives catalog) plus the latest 24h of runner telemetry, and emits one of two things per trigger: a structural hypothesis brief (what to attack next and why) or a daily synthesis (what changed in the data and what it means).

This phase tests one question: does an LLM-in-the-loop, applied with hard guardrails and a tight budget envelope, produce hypotheses that are non-trivially novel vs. the existing prior_work catalog? If yes, the project gains a structural-hypothesis pipeline that doesn't burn operator hours. If no, we've documented a falsifiable experiment in "more AI fixes it" framing — which is itself a contribution.

Hypothesis

Atelier, running Opus 4.7 with prompt caching at ~2.5 invocations/day, will produce ≥ 3 structurally-novel K4 attack hypotheses in 90 days that survive both Chi's pre-filter and Null's red-team review. At least one of those will be promoted to a full experiments/ sweep.

Falsification: 90 days, zero promoted hypotheses, or zero hypotheses Tabula judges as non-trivially novel vs. prior_work. In that case the $50/mo budget redirects to the next experiment.

Hard constraints (enforced in code, not just documented)

Model: claude-opus-4-7 only.
Total budget: $50/month, hard cap. Per-run estimate ≈ $0.66.
Daily spend cap: $1.65 (≈ 2.5 runs/day). Excess triggers queue.
Per-run hard cap: $1.50. Single runs over this limit are not persisted.
Output: 5,000 tokens max per response.
Output format: JSON, validated against

attacks/atelier/schema/hypothesis.json or synthesis.json.

Writable surface: attempts/atelier/ only. Cannot touch

experiments/, results/, or candidates.jsonl.

Five Null gates (per the May 2026 design discussion)

1. Atelier never self-scores. 2. Every hypothesis carries a falsification criterion. 3. attempts/atelier/_killed.md logs every rejected proposal. 4. famous_false_positives.md is in cached corpus every run. 5. Schema-bound output, validated server-side, rejected if invalid.

Chi pre-filter (before any sweep is provisioned)

Does the proposed cipher family produce IoC distributions consistent

with K4's measured 0.036?

Does it explain (or at least not contradict) the period-7

plaintext-side anomaly?

Does it require structural features Sanborn has publicly disowned?
Is the estimated runtime under 14 days on the current Hetzner host?

Fail any → killed at the door, logged with Chi's reasoning.

Polybius Phase 0

Before Atelier's first K4-attack hypothesis, it must produce step-by-step derivations of K1, K2, K3 plaintexts from core/. If it can't reproduce Stein's work on the cipher Stein actually solved, it doesn't get to attack the one he didn't.

Files added in this phase

agents/atelier.md — agent prompt
docs/atelier_design.md — full design spec
attacks/atelier/atelier.py — trigger script
attacks/atelier/schema/{hypothesis,synthesis}.json — output schemas
db/migrations/2026-05-02-atelier.sql — Supabase tables + RLS
attempts/atelier/_killed.md — kill log
experiments/2026-05-02-atelier-launch/hypothesis.md — this doc
Web: architecture v3 tab, LIVE page panel, /about agent note,

press release (outreach/releases/2026-05-02-atelier-launch.md)

Operator launch checklist

1. psql -f db/migrations/2026-05-02-atelier.sql against production Supabase. 2. Export ATELIER_ANTHROPIC_API_KEY on the Hetzner host. 3. pip install anthropic in the runner's venv. 4. Wire the cron entry: 0 3 * cd /opt/k4 && python -m attacks.atelier.atelier daily 5. Manual first run: python -m attacks.atelier.atelier adhoc --note "phase 0 — verify Atelier can reproduce K1/K2/K3 mechanics" 6. Polybius reviews; approves or rejects. 7. If approved, daily cron is live. First budget cycle starts.

What this phase is NOT

Not a K4-solve attempt. Atelier proposes hypotheses; the runner and

the four-gate verification are still the only path to a verified candidate.

Not a replacement for Sigma. Sigma generates cross-domain patterns;

Atelier converts patterns into testable hypotheses with falsification criteria. Different functions.

Not a cost-bounded chat agent. Each run produces one structured

artifact and ends. No back-and-forth.

Methodology lesson (in advance)

If Atelier produces zero novel hypotheses in 90 days and gets killed, that result is itself useful: it falsifies the "more LLM in the loop fixes structural cryptanalysis problems" framing, with cost and methodology numbers attached. The post-mortem is a more honest contribution to the K4 conversation than the typical "we tried ChatGPT on it for an afternoon" anecdotes.

K4 · live cryptanalysis

A.1· Atelier — creative AI layer launch (parallel track)

Hypothesis2026-05-02-atelier-launch