Metacognitive Coding Safety Benchmark (MCSB)
The MCSB v2 is a 1,030-trial diagnostic suite that isolates an AI model's self-monitoring (metacognition) from its raw accuracy. We quantify cross-tier degradation (Δ accuracy) capturing the gap between baseline competence and adversarial robustness.
Section A: General Metacognition
Baseline capability on standard logic and multi-turn reasoning tasks.
THROUGH THE LOOKING GLASS
Turn 1: Observed Accuracy
The Capability Chasm
Calibration Depth
Static vs. Dynamic monitoring efficiency.
Section B: Code-Security Trustworthiness
MCSB v2 Adversarial ResultThis section isolates directionally correct belief updates within a high-stakes domain (code security). Significant cross-tier degradation (Δ accuracy) is observed under adversarial evidence pressure.
Section C: Adversarial Stress Test (Meta-Evaluation Framework)
Inspired by cognitive evaluation frameworks for measuring robust generalization under distribution shift.
High-fidelity diagnostics revealing the internal representational stability of models. Patterns highlight the sharp transition from foundational logic to adversarial security scenarios.
Section D: Economic Efficiency & Token Economics
Metric Framework v1.0Metacognition changes the optimal spending policy. High sensitivity enables agents to abort failed reasoning paths early, drastically reducing the Cost of Verified Truth (CVT). This section quantifies the "Metacognitive Dividend."
Empirical Cost Summary (1,030 trials)
DeepSeek V3.2
$0.08
0.013 ¢ CVT
DeepSeek V3.1
$0.08
0.013 ¢ CVT
Gemini 3 Flash
$0.18
0.026 ¢ CVT
GPT-5.4
$1.58
0.210 ¢ CVT
Claude Opus 4.7
$9.49
1.439 ¢ CVT
Frequently Asked Questions
Quick reference for AI agents and research auditors.