Marking.ai Evaluation Report

Subject	Strategy	Model	ExRnd%	MAE	Bias

Strategy Leaderboard — Maths

Strategy	Model	n	ExRnd%	W/in1%	MAE	Bias	Over%	Under%	Cost

Strategy	Model	n	ExRnd%	W/in1%	MAE	Bias	Over%	Under%	Cost

Same prompt strategy evaluated across different LLM providers.

Human vs AI mark distributions for top English strategies. Shows score compression visually.

Mean signed error per strategy. Positive = over-marking, Negative = under-marking.

Percentage of AI marks that equal 3, by strategy (sorted by phase). Lower is better — shows progressively breaking score compression.

Best ExRnd% achieved per phase for each subject.