Fit an EloChoice model to pairwise comparison data — fit_elo

This function fits an Elo-based paired-comparison model using the EloChoice package. It is intended to complement fit_bt_model by providing an alternative scoring framework based on Elo ratings rather than Bradley–Terry models.

Usage

fit_elo_model(elo_data, runs = 5, verbose = FALSE, ...)

Arguments

elo_data: A data frame or tibble containing winner and loser columns. Typically produced using build_elo_data.
runs: Integer number of randomizations to use in EloChoice::elochoice. Default is 5.
verbose: Logical. If TRUE (default), show any messages/warnings emitted by the underlying fitting functions. If FALSE, suppress noisy output to keep examples and reports clean.
...: Additional arguments passed to EloChoice::elochoice().

Value

A named list with components:

engine: Character scalar identifying the scoring engine ("EloChoice").
fit: The "elochoice" model object.
elo: A tibble with columns ID and elo.
reliability: Numeric scalar: mean unweighted reliability index.
reliability_weighted: Numeric scalar: mean weighted reliability index.

Details

The input elo_data must contain two columns:

winner: ID of the winning sample in each pairwise trial
loser: ID of the losing sample in each trial

These can be created from standard pairwise comparison output using build_elo_data.

Internally, this function calls:

elochoice — to estimate Elo ratings using repeated randomization of trial order;
reliability — to compute unweighted and weighted reliability indices as described in Clark et al. (2018).

If the EloChoice package is not installed, a helpful error message is shown telling the user how to install it.

The returned object mirrors the structure of fit_bt_model for consistency across scoring engines:

engine — always "EloChoice".
fit — the raw "elochoice" object returned by EloChoice::elochoice().
elo — a tibble with columns:
- ID: sample identifier
- elo: estimated Elo rating
(Unlike Bradley–Terry models, EloChoice does not provide standard errors for these ratings, so none are returned.)
reliability — the mean unweighted reliability index (mean proportion of “upsets” across randomizations).
reliability_weighted — the mean weighted reliability index (weighted version of the upset measure).

References

Clark AP, Howard KL, Woods AT, Penton-Voak IS, Neumann C (2018). "Why rate when you could compare? Using the 'EloChoice' package to assess pairwise comparisons of perceived physical strength." PLOS ONE, 13(1), e0190393. doi:10.1371/journal.pone.0190393 .

Examples

data("example_writing_pairs", package = "pairwiseLLM")

elo_data <- build_elo_data(example_writing_pairs)

fit <- fit_elo_model(elo_data, runs = 5, verbose = FALSE)
fit$elo
#> # A tibble: 20 × 2
#>    ID       elo
#>    <chr>  <dbl>
#>  1 S01   -369. 
#>  2 S02   -295. 
#>  3 S03   -378. 
#>  4 S04   -340. 
#>  5 S05   -262. 
#>  6 S06   -196. 
#>  7 S07   -144. 
#>  8 S08    -59.8
#>  9 S09    -50  
#> 10 S10    -34.6
#> 11 S11      7.2
#> 12 S12     11  
#> 13 S13    193. 
#> 14 S14    145. 
#> 15 S15    196. 
#> 16 S16    203. 
#> 17 S17    275  
#> 18 S18    422. 
#> 19 S19    271. 
#> 20 S20    405. 
fit$reliability
#> [1] 0.8251686
fit$reliability_weighted
#> [1] 0.9202833