Live Anthropic (Claude) comparison for a single pair of samples
Source:R/anthropic_live.R
anthropic_compare_pair_live.RdThis function sends a single pairwise comparison prompt to the Anthropic Messages API (Claude models) and parses the result into a small tibble.
Usage
anthropic_compare_pair_live(
ID1,
text1,
ID2,
text2,
model,
trait_name,
trait_description,
prompt_template = set_prompt_template(),
tag_prefix = "<BETTER_SAMPLE>",
tag_suffix = "</BETTER_SAMPLE>",
api_key = NULL,
anthropic_version = "2023-06-01",
reasoning = c("none", "enabled"),
include_raw = FALSE,
include_thoughts = NULL,
...
)Arguments
- ID1
Character ID for the first sample.
- text1
Character string containing the first sample's text.
- ID2
Character ID for the second sample.
- text2
Character string containing the second sample's text.
- model
Anthropic Claude model name (for example
"claude-sonnet-4-5","claude-haiku-4-5", or"claude-opus-4-5").- trait_name
Short label for the trait (for example "Overall Quality").
- trait_description
Full-text definition of the trait.
- prompt_template
Prompt template string, typically from
set_prompt_template. The template should embed the full instructions, rubric text, and<BETTER_SAMPLE>tagging convention.- tag_prefix
Prefix for the better-sample tag. Defaults to
"<BETTER_SAMPLE>".- tag_suffix
Suffix for the better-sample tag. Defaults to
"</BETTER_SAMPLE>".- api_key
Optional Anthropic API key. Defaults to
Sys.getenv("ANTHROPIC_API_KEY").- anthropic_version
Anthropic API version string passed as the
anthropic-versionHTTP header. Defaults to"2023-06-01".- reasoning
Character scalar indicating whether to allow more extensive internal "thinking" before the visible answer. Two values are recognised:
"none"– standard prompting (recommended default)."enabled"– uses Anthropic's extended thinking mode by sending athinkingblock with a token budget; this also changes the defaultmax_tokensand constrainstemperature.
- include_raw
Logical; if
TRUE, adds a list-columnraw_responsecontaining the parsed JSON body returned by Anthropic (orNULLon parse failure). This is useful for debugging parsing problems.- include_thoughts
Logical or
NULL. WhenTRUEandreasoning = "none", this function upgrades to extended thinking mode by settingreasoning = "enabled"before constructing the request, which in turn impliestemperature = 1and adds athinkingblock. WhenFALSEandreasoning = "enabled", a warning is issued but extended thinking is still used. WhenNULL(the default),reasoningis used as-is.- ...
Additional Anthropic parameters such as
max_tokens,temperature,top_por a customthinking_budget_tokens, which will be passed through to the Messages API.When
reasoning = "none"the defaults are:temperature = 0(deterministic behaviour) unless you supplytemperatureexplicitly.max_tokens = 768unless you supplymax_tokens.
When
reasoning = "enabled"(extended thinking), the Anthropic API imposes additional constraints:temperaturemust be 1. If you supply a different value, this function will throw an error.thinking_budget_tokensmust satisfythinking_budget_tokens >= 1024andthinking_budget_tokens < max_tokens. If you supply a value that violates these constraints, this function will throw an error.By default,
max_tokens = 2048andthinking_budget_tokens = 1024.
Value
A tibble with one row and columns:
- custom_id
ID string of the form
"LIVE_<ID1>_vs_<ID2>".- ID1, ID2
The sample IDs you supplied.
- model
Model name reported by the API.
- object_type
Anthropic object type (for example
"message").- status_code
HTTP-style status code (200 if successful).
- error_message
Error message if something goes wrong; otherwise NA.
- thoughts
Summarised thinking / reasoning text when
reasoning = "enabled"and the API returns thinking blocks; otherwiseNA.- content
Concatenated text from the assistant output (excluding thinking blocks).
- better_sample
"SAMPLE_1", "SAMPLE_2", or NA.
- better_id
ID1 if SAMPLE_1 is chosen, ID2 if SAMPLE_2 is chosen, otherwise NA.
- prompt_tokens
Prompt / input token count (if reported).
- completion_tokens
Completion / output token count (if reported).
- total_tokens
Total token count (reported by the API or computed as input + output tokens when not provided).
- raw_response
(Optional) list-column containing the parsed JSON body.
Details
It mirrors the behaviour and output schema of
openai_compare_pair_live, but targets Anthropic's
/v1/messages endpoint. The prompt template, <BETTER_SAMPLE> tag
convention, and downstream parsing / BT modelling can remain unchanged.
The function is designed to work with Claude models such as Sonnet, Haiku, and Opus in the "4.5" family. You can pass any valid Anthropic model string, for example:
"claude-sonnet-4-5""claude-haiku-4-5""claude-opus-4-5"
The API typically responds with a dated model string such as
"claude-sonnet-4-5-20250929" in the model field.
Recommended defaults for pairwise writing comparisons
For stable, reproducible comparisons we recommend:
reasoning = "none"withtemperature = 0andmax_tokens = 768for standard pairwise scoring.reasoning = "enabled"when you explicitly want extended thinking; in this mode Anthropic requirestemperature = 1. The default in this function ismax_tokens = 2048andthinking_budget_tokens = 1024, which satisfies the documented constraintsthinking_budget_tokens >= 1024andthinking_budget_tokens < max_tokens.
When reasoning = "enabled", this function also sends a
thinking block to the Anthropic API:
"thinking": {
"type": "enabled",
"budget_tokens": <thinking_budget_tokens>
}
Setting include_thoughts = TRUE when reasoning = "none"
is a convenient way to opt into Anthropic's extended thinking mode without
changing the reasoning argument explicitly. In that case,
reasoning is upgraded to "enabled", the default
temperature becomes 1, and a thinking block is included in the
request. When reasoning = "none" and include_thoughts is
FALSE or NULL, the default temperature remains 0 unless
you explicitly override it.
Examples
if (FALSE) { # \dontrun{
# Requires ANTHROPIC_API_KEY and network access.
library(pairwiseLLM)
data("example_writing_samples", package = "pairwiseLLM")
samples <- example_writing_samples[1:2, ]
td <- trait_description("overall_quality")
tmpl <- set_prompt_template()
# Short, deterministic comparison with no explicit thinking block
res_claude <- anthropic_compare_pair_live(
ID1 = samples$ID[1],
text1 = samples$text[1],
ID2 = samples$ID[2],
text2 = samples$text[2],
model = "claude-sonnet-4-5",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
reasoning = "none",
include_raw = FALSE
)
res_claude$better_id
# Allow more internal thinking and a longer explanation
res_claude_reason <- anthropic_compare_pair_live(
ID1 = samples$ID[1],
text1 = samples$text[1],
ID2 = samples$ID[2],
text2 = samples$text[2],
model = "claude-sonnet-4-5",
trait_name = td$name,
trait_description = td$description,
prompt_template = tmpl,
reasoning = "enabled",
include_raw = TRUE,
include_thoughts = TRUE
)
res_claude_reason$total_tokens
substr(res_claude_reason$content, 1, 200)
} # }