Statistics¶
RetroCast uses bootstrap resampling to calculate confidence intervals for scored model results.
Compute Statistics¶
Generate statistical summary
from retrocast.api import compute_model_statistics
stats = compute_model_statistics(results, n_boot=10000, seed=42)
solvability = stats.solvability.overall
print(f"Solvability: {solvability.value:.1%} "
f"[{solvability.ci_lower:.1%}, {solvability.ci_upper:.1%}]")
for k in [1, 5, 10]:
topk = stats.top_k[k].overall
print(f"Top-{k}: {topk.value:.1%} [{topk.ci_lower:.1%}, {topk.ci_upper:.1%}]")
print("\nStratified by length:")
for length, metric in stats.solvability.by_group.items():
print(f" Length {length}: {metric.value:.1%} [{metric.ci_lower:.1%}, {metric.ci_upper:.1%}]")
Typical metrics include overall solvability, Top-K accuracy, ground-truth match rate, and stratified performance by route length.