Run adaptive ranking end-to-end from data and model settings

High-level workflow wrapper that reads sample data, constructs an LLM judge, starts or resumes adaptive state, runs adaptive_rank_run_live(), and returns state plus summary outputs.

Usage

adaptive_rank(
  data,
  id_col = 1,
  text_col = 2,
  backend = c("openai", "anthropic", "gemini", "together", "ollama"),
  model = NULL,
  trait = "overall_quality",
  trait_name = NULL,
  trait_description = NULL,
  prompt_template = set_prompt_template(),
  endpoint = "chat.completions",
  api_key = NULL,
  include_raw = FALSE,
  judge_args = list(),
  judge_call_args = list(),
  n_steps = 1L,
  fit_fn = NULL,
  adaptive_config = NULL,
  btl_config = NULL,
  session_dir = NULL,
  persist_item_log = FALSE,
  resume = TRUE,
  seed = 1L,
  progress = c("all", "refits", "steps", "none"),
  progress_redraw_every = 10L,
  progress_show_events = TRUE,
  progress_errors = TRUE,
  save_outputs = FALSE,
  output_file = NULL,
  judge = NULL
)

Arguments

data: Data source: a data frame/tibble, a file path (.csv, .tsv, .txt, .rds), or a directory containing .txt files.
id_col: ID column selector for tabular inputs. Passed to read_samples_df().
text_col: Text column selector for tabular inputs. Passed to read_samples_df().
backend: Backend passed to make_adaptive_judge_llm().
model: Model passed to make_adaptive_judge_llm().
trait: Built-in trait key used when no custom trait is supplied. Ignored when both trait_name and trait_description are supplied.
trait_name: Optional custom trait display name.
trait_description: Optional custom trait definition.
prompt_template: Prompt template string. Defaults to set_prompt_template().
endpoint: Endpoint family passed to make_adaptive_judge_llm(). Only used when backend = "openai"; ignored otherwise.
api_key: Optional API key passed to make_adaptive_judge_llm().
include_raw: Logical; forwarded to make_adaptive_judge_llm().
judge_args: Named list of fixed additional arguments forwarded to llm_compare_pair() by the generated judge.
judge_call_args: Named list of additional arguments forwarded to the judge at run time through adaptive_rank_run_live().
n_steps: Maximum number of attempted adaptive steps to execute in this call. The run may return earlier due to candidate starvation or BTL stop criteria. Attempted invalid steps also count toward this limit.
fit_fn: Optional fit override passed to adaptive_rank_run_live().
adaptive_config: Optional named list passed to adaptive_rank_start() and adaptive_rank_run_live() to control adaptive controller behavior. Supported fields: global_identified_reliability_min, global_identified_rank_corr_min, p_long_low, p_long_high, long_taper_mult, long_frac_floor, mid_bonus_frac, explore_taper_mult, boundary_k, boundary_window, boundary_frac, p_star_override_margin, and star_override_budget_per_round. Unknown fields and invalid values abort with actionable errors.
btl_config: Optional named list passed to adaptive_rank_run_live() to control BTL refit cadence, stopping diagnostics, and selected round-log diagnostics. Supported fields: refit_pairs_target, model_variant, ess_bulk_min, ess_bulk_min_near_stop, max_rhat, divergences_max, eap_reliability_min, stability_lag, theta_corr_min, theta_sd_rel_change_max, rank_spearman_min, near_tie_p_low, and near_tie_p_high (near_tie_* affects round logging only, not stop decisions). Defaults are resolved from the current item count and merged with user overrides.
session_dir: Optional session directory for persistence/resume.
persist_item_log: Logical; write per-refit item logs when TRUE.
resume: Logical; when TRUE and session_dir contains a valid session, resume from disk; otherwise initialize a new state.
seed: Integer seed used when creating a new adaptive state.
progress: Progress mode for adaptive_rank_run_live().
progress_redraw_every: Redraw interval for progress output.
progress_show_events: Logical; show step events.
progress_errors: Logical; show invalid-step events.
save_outputs: Logical; when TRUE, save returned outputs as .rds.
output_file: Optional output .rds path. If NULL and save_outputs = TRUE, defaults to file.path(session_dir, "adaptive_outputs.rds") when session_dir is set, otherwise to a temporary file.
judge: Optional prebuilt judge function with contract judge(A, B, state, ...). If supplied, model/trait/template options are ignored and this function is used directly.

Value

A list with:

state: Final adaptive_state.
summary: Run-level summary from summarize_adaptive().
refits: Per-refit summary from summarize_refits().
items: Item summary from summarize_items().
logs: Canonical logs from adaptive_get_logs().
output_file: Saved output path when save_outputs = TRUE, otherwise NULL.

Details

This helper is designed for end users who want one entry point for adaptive runs. It supports:

data input from a data frame, file (.csv, .tsv, .txt, .rds), or a directory of .txt files;
model/backend configuration through make_adaptive_judge_llm();
all adaptive runtime controls exposed by adaptive_rank_run_live();
resumability via session_dir and resume;
optional saving of run outputs to an .rds artifact.

Model options: use judge_args (fixed) and judge_call_args (per-run overrides) to pass any additional llm_compare_pair() arguments, including provider-specific controls such as reasoning, service_tier, temperature, top_p, logprobs, include_thoughts, or host.

Adaptive options: all key controls from adaptive_rank_run_live() are available directly: n_steps, fit_fn, adaptive_config, btl_config, progress, progress_redraw_every, progress_show_events, progress_errors, session_dir, and persist_item_log. Use adaptive_config for identifiability-gated controller behavior and btl_config for inference/diagnostics cadence only.

Selection semantics: pair selection is TrueSkill-driven in one-pair transactional steps. Rolling anchors are refreshed from current score proxies and anchor-link routing compares exactly one anchor endpoint with one non-anchor endpoint. Long/mid-link routing excludes anchor-anchor and anchor-non-anchor pairs, while local-link routing admits same-stratum pairs and anchor-involving pairs according to stage bounds.

Wrapper-visible defaults include top-band refinement (top_band_pct = 0.10, top_band_bins = 5) with top-band size computed as ceiling(top_band_pct * N).

Exposure and repeat routing: under-represented routing is degree-based (deg <= D_min + 1), while repeat-pressure gating is based on recent exposure (bottom-quantile recent_deg with quantile default 0.25) and per-endpoint repeat slot accounting.

Inference separation: BTL refits are used for posterior inference, diagnostics, and stopping only. They are not used to choose the next pair.

Resume behavior: when resume = TRUE and session_dir already contains adaptive artifacts, failed session loads abort with an actionable error instead of starting a fresh run silently.

Examples

data("example_writing_samples", package = "pairwiseLLM")

out <- adaptive_rank(
  data = example_writing_samples[1:8, c("ID", "text", "quality_score")],
  id_col = "ID",
  text_col = "text",
  model = "gpt-5.1",
  judge = function(A, B, state, ...) {
    y <- as.integer(A$quality_score[[1]] >= B$quality_score[[1]])
    list(is_valid = TRUE, Y = y, invalid_reason = NA_character_)
  },
  n_steps = 4,
  progress = "none"
)

out$summary
#> # A tibble: 1 × 6
#>   n_items steps_attempted committed_pairs n_refits last_stop_decision
#>     <int>           <int>           <int>    <int> <lgl>             
#> 1       8               4               4        0 FALSE             
#> # ℹ 1 more variable: last_stop_reason <chr>
head(out$logs$step_log)
#> # A tibble: 4 × 51
#>   step_id timestamp           pair_id     i     j     A     B     Y status
#>     <int> <dttm>                <int> <int> <int> <int> <int> <int> <chr> 
#> 1       1 2026-02-11 04:39:08       1     1     4     1     4     0 ok    
#> 2       2 2026-02-11 04:39:08       2     4     8     4     8     0 ok    
#> 3       3 2026-02-11 04:39:08       3     8     2     8     2     1 ok    
#> 4       4 2026-02-11 04:39:08       4     2     6     2     6     0 ok    
#> # ℹ 42 more variables: round_id <int>, round_stage <chr>, pair_type <chr>,
#> #   used_in_round_i <int>, used_in_round_j <int>, is_anchor_i <lgl>,
#> #   is_anchor_j <lgl>, stratum_i <int>, stratum_j <int>, dist_stratum <int>,
#> #   stage_committed_so_far <int>, stage_quota <int>, is_explore_step <lgl>,
#> #   explore_mode <chr>, explore_reason <chr>, explore_rate_used <dbl>,
#> #   local_priority_mode <chr>, long_gate_pass <lgl>, long_gate_reason <chr>,
#> #   star_override_used <lgl>, star_override_reason <chr>, …

if (FALSE) { # \dontrun{
# Live run with OpenAI gpt-5.1 + flex priority.
live <- adaptive_rank(
  data = example_writing_samples[1:12, c("ID", "text")],
  backend = "openai",
  model = "gpt-5.1",
  endpoint = "responses",
  judge_args = list(
    reasoning = "low",
    service_tier = "flex",
    include_thoughts = FALSE
  ),
  btl_config = list(
    refit_pairs_target = 20L,
    ess_bulk_min = 500,
    eap_reliability_min = 0.92
  ),
  adaptive_config = list(
    explore_taper_mult = 0.40,
    star_override_budget_per_round = 2L
  ),
  n_steps = 120,
  session_dir = file.path(tempdir(), "adaptive-live"),
  persist_item_log = TRUE,
  resume = TRUE,
  progress = "all",
  save_outputs = TRUE
)

print(live$state)
live$summary
} # }