Marking.ai Evaluation Report
Best Strategy per Subject
| Subject |
Strategy |
Model |
ExRnd% |
MAE |
Bias |
Strategy Leaderboard — Maths
| Strategy |
Model |
n |
ExRnd% |
W/in1% |
MAE |
Bias |
Over% |
Under% |
Cost |
Strategy Leaderboard — English
| Strategy |
Model |
n |
ExRnd% |
W/in1% |
MAE |
Bias |
Over% |
Under% |
Cost |
Cross-Model Comparison (Phase 5)
Same prompt strategy evaluated across different LLM providers.
Score Distribution Analysis
Human vs AI mark distributions for top English strategies. Shows score compression visually.
Bias Analysis
Mean signed error per strategy. Positive = over-marking, Negative = under-marking.
Score Compression Over Time (English)
Percentage of AI marks that equal 3, by strategy (sorted by phase). Lower is better — shows progressively breaking score compression.
Phase Progress
Best ExRnd% achieved per phase for each subject.