Sentinelsentinel-fermi-bench

gpt-5.1-2025-11-13

27 runs · 27 questions

Aggregate

SearchScored / TotalMean CramérMed |bias|Output tokens
off27 / 270.26790.350216,144

Cramér-log distribution

One bar per Cramér-log bucket. Buckets match the heatmap color scale: tight (<0.05), clean (<0.2), productive (<0.7), different-interpretation (<2), suspect (≥2).

Per-question Cramér-log

One row per question. Cells colored by Cramér-log band. Click a question to open its detail view.