Inspired from Keep a Changelog
- Introduced dynamic percentile-based relevance thresholding for binary-dependent metrics (Precision, MAP) to replace hard-coded
j > 0mapping (#394)
- Fixed thread pool starvation in LLM judgment processing (#387)
- Extract reusable BatchedAsyncExecutor; migrate LlmJudgmentTaskManager and ExperimentTaskManager to use it (#392)