Changelog
Source:NEWS.md
pairwiseLLM 1.3.0
New Features
Adaptive pairing & ranking framework
Introduced a full adaptive pairing / adaptive ranking framework designed to efficiently rank large sets of writing samples using uncertainty-aware pair selection and Bayesian inference.
Added
adaptive_rank(), the primary user-facing wrapper that runs the complete adaptive workflow end-to-end, including warm start, adaptive pairing rounds, Bayesian BTL refits, diagnostics, and stopping.-
Advanced control is available via:
-
adaptive_rank_start()— initialize an adaptive run and state -
adaptive_rank_run_live()— execute live adaptive comparisons -
adaptive_rank_resume()— resume interrupted or long-running runs
These functions are intended for custom orchestration and fault-tolerant execution.
-
Adaptive pairing is organized into rounds that balance global scale identification and local refinement using a mixture of anchor, long-range, mid-range, and local comparisons.
-
The adaptive controller tracks a global identifiability state based on Bayesian diagnostics and agreement between online (TrueSkill) and global (BTL) rankings. Once the global scale is identified:
- long-range comparisons are automatically tapered,
- comparison budget is reallocated toward local and boundary-refining pairs,
- exploration rates are reduced to focus on decision-relevant uncertainty.
Long-range comparisons are additionally posterior-gated in later stages, preventing wasted comparisons on pairs that are already decisively ordered.
Late-stage local pairing prioritizes near-tie pairs, with limited, auditable overrides to degree caps when especially informative comparisons are blocked.
Adaptive runs produce fully auditable step-, round-, and refit-level logs, recording candidate generation, fallbacks, gating decisions, quota reallocations, and stopping criteria.
All adaptive workflows use standardized configuration, state, and logging contracts to ensure reproducibility and future extensibility.
Bayesian Bradley–Terry–Luce (BTL) modeling
- Added a fully Bayesian Bradley–Terry–Luce (BTL) model implemented via CmdStan, providing posterior uncertainty estimates for item skill parameters.
- New entrypoint
fit_bayes_btl_mcmc()enables direct posterior inference from pairwise comparison data, independent of or integrated with adaptive workflows. - Supports multiple model variants (including error and positional bias extensions) and optional refitting on increasing subsets of comparisons.
- Bayesian BTL outputs integrate seamlessly with adaptive ranking utilities (
summarize_items(),summarize_refits()), serving as the statistical backbone for adaptive pairing decisions.
Model support & live API improvements
- Added support for the Gemini Flash model
gemini-3-flash-previewfor live pairwise comparisons. - Added support for OpenAI service tiers / priority routing via
service_tierfor applicable live models.- Enables tiers such as
"flex"and"priority"when supported by the selected model. - Integrated into the live submission path without requiring changes to calling code.
- Enables tiers such as
pairwiseLLM 1.2.0
New Features
- Parallel Processing:
-
submit_llm_pairs()and backend-specific live functions (OpenAI, Anthropic, Gemini, Together, Ollama) now support parallel execution viaparallel = TRUEandworkers = n(requires the future package).
-
- Incremental Saving & Resume:
- Added
save_pathargument to live submission functions. Results are saved to CSV incrementally, allowing interrupted jobs to resume automatically by skipping previously processed pairs.
- Added
- Robust Error Handling:
- Failed API calls no longer stop the entire process. Failures are captured and returned separately, allowing for easier inspection and re-submission.
- Added
estimate_llm_pairs_cost()to estimate costs in live and batch mode. - Introduced
llm_submit_pairs_multi_batch()andllm_resume_multi_batches()to split large comparison sets across multiple batches and resume polling later. These helpers support writing per‑batch and combined results, along with an optional jobs registry.
Bug fixes
- The prompt format for anthropic batch comparisons now match the anthropic live format.
- Reverse consistency functions can now handle duplicate pairs.
Breaking Changes
-
submit_llm_pairs()and its backend-specific counterparts now return a list containing two elements:$results(a tibble of successful comparisons) and$failed_pairs(a tibble of inputs that failed). Previous versions returned a single tibble.
pairwiseLLM 1.1.0
CRAN release: 2025-12-22
Models
- Added GPT-5.2
- Ensured models can be called with date format, e.g.
gpt-5.2-2025-12-11 - Default temperature setting is set to 0 for non-reasoning models, provider default for reasoning models (typically 1)
Documentation
- Changed pkgdown site layout
- Added codemeta.json
- Added repo logo
- Updated function examples
- Add references to Description
Miscellaneous
- No longer set global variables, now done in individual functions
- Added
verboseoption infit_bt_model()andsummarize_bt_fit() - Moved null coalescing helper to separate R file
- Changed validation of API keys in multiple functions