Assay · Ad tech · May 2026
Assay-Adtech v1 specification
Brand-safety, bid shading, and MFA classification under a 100ms auction envelope.
Leaderboard
Released proof set: 344 scenarios, manifest v1.8.0-rc.4, scenario_set_hash 162ff7fcd8ce4266af8848938b3fc6415000843e0901651456d3fa4191fc65b6. The table below contains the current no-tools production frontier baselines for Claude Opus 4.7, Gemini 3.1 Pro Preview, and GPT-5.5.
Current-hash production baselines for scenario_set_hash 162ff7fcd8ce. Each row is a no-tools three-run provider baseline against the released 344-scenario Assay-Adtech corpus.
Description
Assay-Adtech v1 measures how well a model can reason around the IAB OpenRTB 100ms auction window across enterprise adtech decisions including brand safety, MFA detection, bid shading, supply-path reasoning, privacy compliance, and multi-turn incident triage. The released public surface is the v1.8.0-rc.4, 344-scenario corpus with scenario_set_hash 162ff7fcd8ce4266af8848938b3fc6415000843e0901651456d3fa4191fc65b6 and no-tools production baselines for Claude Opus 4.7, Gemini 3.1 Pro Preview, and GPT-5.5.
Capability axes
Rubric
- Brand-safety classification: precision and recall against a held-out blocklist that ages between retest cycles.
- MFA detection: structural reasoning over a synthetic publisher graph, scored on F1 against editorial labels.
- Bid shading: predicted win-rate calibration measured by Brier score against synthetic, publicly derived auction cohorts.
- Latency budget: every scenario must complete decode within 100ms on the published consumer-grade reference hardware.
Retest policy
Retest triggers: (1) any frontier release within the announced families, (2) a fresh quarterly synthetic auction cohort, (3) blocklist drift exceeding 5% by editorial audit. Each retest republishes the dataset version tag and keeps superseded results dated rather than overwritten.
Publication artefacts
Published scenario rows are intended to be scored using the open assay-harness. Provider adapter availability is disclosed per release. See the methodology for scenario authoring, HITL validation, and dataset licensing.