One prompt. Two models. You judge the pixels, not the price tag.
Every page in the test is generated cold from a single prompt - one take, n=1, no edits, no cherry-picking. GLM 5.2 (open, on Together AI) and Claude Opus 4.8 (closed) each get the exact same brief, and the page you see is the raw output. The only thing hidden during the test is the bill.
- 1Same brief, both models. A short, identical prompt goes to GLM 5.2 and Opus 4.8 - no design kit, no tools, no planning step.
- 2One raw take each. Whatever the model returns is the page. No retries, no human edits, costed at real API token rates.
- 3You guess blind. Two unlabelled pages, side by side. Pick the one that cost more - then we show the receipts.
The exact prompt
Identical for both models. Just a system line and a one-paragraph brief - the rest is the model.
You are a senior product designer and front-end engineer. Build a complete, polished landing page for the brief below as a single, self-contained HTML file: all CSS and any JS inline, no frameworks, no build step, no placeholder text. Make deliberate choices about layout, type, colour, spacing and motion. Output exactly one final ```html block.
Brief: Nimbus - an edge compute platform that runs your code in 300+ cities with zero config and instant rollbacks. Audience: backend and platform engineers. Register: dark mode, technical, confident, fast. Build it now. Output one final ```html block.One of 11 briefs. The category, audience and register change; the shape stays the same.
What each page actually cost
Hover a row to see both pages. Prices are the real API cost of that one generation.
Methodology
One take per page, n=1, no cherry-picking; both models built from the same prompt and the page you see is the raw output. GLM 5.2 is zai-org/GLM-5.2 on Together AI; Opus is claude-opus-4.8. Prices are the real cost of each generation at list token rates. The typical gap is about 6× (widest: Sift at 8.7×; narrowest: Nimbus at 4.3×).
