Backend-agnostic live comparison for a single pair of samples
Source:R/llm_backends.R
llm_compare_pair.Rdllm_compare_pair() is a thin wrapper around backend-specific comparison
functions. It currently supports the "openai", "anthropic", "gemini",
"together", and "ollama" backends and forwards the call to the
appropriate live comparison helper:
"openai"→openai_compare_pair_live()"anthropic"→anthropic_compare_pair_live()"gemini"→gemini_compare_pair_live()"together"→together_compare_pair_live()"ollama"→ollama_compare_pair_live()
Usage
llm_compare_pair(
ID1,
text1,
ID2,
text2,
model,
trait_name,
trait_description,
prompt_template = set_prompt_template(),
backend = c("openai", "anthropic", "gemini", "together", "ollama"),
endpoint = c("chat.completions", "responses"),
api_key = NULL,
include_raw = FALSE,
...
)Arguments
- ID1
Character ID for the first sample.
- text1
Character string containing the first sample's text.
- ID2
Character ID for the second sample.
- text2
Character string containing the second sample's text.
- model
Model identifier for the chosen backend. For
"openai"this should be an OpenAI model name (for example"gpt-4.1","gpt-5.1"). For"anthropic"and"gemini", use the corresponding provider model names (for example"claude-4-5-sonnet"or"gemini-3-pro-preview"). For "together", use Together.ai model identifiers such as"deepseek-ai/DeepSeek-R1"or"deepseek-ai/DeepSeek-V3". For"ollama", use a local model name known to the Ollama server (for example"mistral-small3.2:24b","qwen3:32b","gemma3:27b").- trait_name
Short label for the trait (for example
"Overall Quality").- trait_description
Full-text definition of the trait.
- prompt_template
Prompt template string, typically from
set_prompt_template().- backend
Character scalar indicating which LLM provider to use. One of
"openai","anthropic","gemini","together", or"ollama".- endpoint
Character scalar specifying which endpoint family to use for backends that support multiple live APIs. For the
"openai"backend this must be one of"chat.completions"or"responses", matchingopenai_compare_pair_live(). For"anthropic","gemini", and"ollama", this argument is currently ignored.- api_key
Optional API key for the selected backend. If
NULL, the backend-specific helper will use its own default environment variable (for exampleOPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY,TOGETHER_API_KEY). For"ollama", this argument is ignored (no API key is required for local inference).- include_raw
Logical; if
TRUE, the returned tibble includes araw_responselist-column with the parsed JSON body (orNULLon parse failure). Support for this may vary across backends.- ...
Additional backend-specific parameters. For
"openai"these are passed on toopenai_compare_pair_live()and typically include arguments such astemperature,top_p,logprobs,reasoning, andinclude_thoughts. For"anthropic"and"gemini"they are forwarded to the corresponding live helper and may include parameters such asreasoning,include_thoughts,max_output_tokens, or provider-specific options. For"ollama", arguments are forwarded toollama_compare_pair_live()and may includehost,think,num_ctx, and other Ollama-specific controls.
Value
A tibble with one row and the same columns as the underlying
backend-specific live helper (for example openai_compare_pair_live()
for "openai"). All backends are intended to return a compatible
structure including thoughts, content, and token counts.
Details
All backends are expected to return a tibble with a compatible structure, including:
custom_id,ID1,ID2model,object_type,status_code,error_messagethoughts(reasoning / thinking text when available)content(visible assistant output)better_sample,better_idprompt_tokens,completion_tokens,total_tokens
For the "openai" backend, the endpoint argument controls whether
the Chat Completions API ("chat.completions") or the Responses API
("responses") is used. For the "anthropic", "gemini", and
"ollama" backends, endpoint is currently ignored and the default
live API for that provider is used.
See also
openai_compare_pair_live(),anthropic_compare_pair_live(),gemini_compare_pair_live(),together_compare_pair_live(), andollama_compare_pair_live()for backend-specific implementations.submit_llm_pairs()for row-wise comparisons over a tibble of pairs.build_bt_data()andfit_bt_model()for Bradley–Terry modelling of comparison results.
Examples
if (FALSE) { # \dontrun{
# Requires an API key for the chosen cloud backend. For OpenAI, set
# OPENAI_API_KEY in your environment. Running these examples will incur
# API usage costs.
#
# For local Ollama use, an Ollama server must be running and the models
# must be pulled in advance. No API key is required for the `"ollama"`
# backend.
data("example_writing_samples", package = "pairwiseLLM")
samples <- example_writing_samples[1:2, ]
td <- trait_description("overall_quality")
tmpl <- set_prompt_template()
# Single live comparison using the OpenAI backend and chat.completions
res_live <- llm_compare_pair(
ID1 = samples$ID[1],
text1 = samples$text[1],
ID2 = samples$ID[2],
text2 = samples$text[2],
model = "gpt-4.1",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
backend = "openai",
endpoint = "chat.completions",
temperature = 0
)
res_live$better_id
# Using the OpenAI responses endpoint with gpt-5.1 and reasoning = "low"
res_live_gpt5 <- llm_compare_pair(
ID1 = samples$ID[1],
text1 = samples$text[1],
ID2 = samples$ID[2],
text2 = samples$text[2],
model = "gpt-5.1",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
backend = "openai",
endpoint = "responses",
reasoning = "low",
include_thoughts = TRUE,
temperature = NULL,
top_p = NULL,
logprobs = NULL,
include_raw = TRUE
)
str(res_live_gpt5$raw_response[[1]], max.level = 2)
# Example: single live comparison using a local Ollama backend
res_ollama <- llm_compare_pair(
ID1 = samples$ID[1],
text1 = samples$text[1],
ID2 = samples$ID[2],
text2 = samples$text[2],
model = "mistral-small3.2:24b",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
backend = "ollama",
host = getOption(
"pairwiseLLM.ollama_host",
"http://127.0.0.1:11434"
),
think = FALSE
)
res_ollama$better_id
} # }